📖 What is Amazon Kinesis?
Amazon Kinesis is a scalable and fully managed real-time data streaming service. It allows you to collect, process, and analyze real-time, streaming data from hundreds of thousands of sources, such as social media feeds or website clickstreams.
"Whenever you see 'real-time streaming' or 'ingesting telemetry data' in a scenario, look for Kinesis. It is the opposite of 'batch processing' (which would be Glue or EMR)."
📚 Certification: AWS Certified Cloud Practitioner (CLF-C02)
🔑 What are the Key Concepts of Amazon Kinesis?
- ▸ Real-time ingestion allows for the immediate capture of streaming data from sources like IoT sensors, social media, or application logs for instant analysis.
- ▸ Scalability is achieved through sharding, allowing the service to handle massive throughput by distributing data across multiple parallel streams.
- ▸ Kinesis Data Streams provides a low-latency pipeline for custom applications to process and store streaming data in real-time.
- ▸ Kinesis Data Firehose simplifies the process of loading streaming data directly into destinations like Amazon S3, Redshift, or OpenSearch.
- ▸ Kinesis Data Analytics enables the use of SQL to analyze and process streaming data on the fly without managing servers.
🎯 How does Amazon Kinesis appear on the CLF-C02 Exam?
You may be asked to identify the best service for a company that needs to analyze website clickstream data in real-time to update a live marketing dashboard.
A scenario might describe a need to collect telemetry data from millions of IoT devices and load it directly into an S3 bucket without writing custom code.
Expect questions that ask you to distinguish between real-time streaming and batch processing, where Kinesis is the correct choice for streaming and AWS Glue is for batch.
❓ Frequently Asked Questions
How does Kinesis differ from AWS Glue?
Kinesis is designed for real-time streaming data, allowing you to process information as it arrives. AWS Glue is an ETL service used for batch processing, where data is processed in large groups at scheduled intervals.
When should I choose Kinesis Data Firehose over Kinesis Data Streams?
Choose Firehose for 'load and forget' scenarios where data needs to go straight to S3 or Redshift. Choose Data Streams when you need sub-second latency and custom processing logic via a consumer application.