📖 What is Amazon Kinesis?
Amazon Kinesis is a platform for collecting, processing, and analyzing real-time, streaming data. It offers services like Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics, enabling applications to ingest, transform, and analyze data in motion with low latency.
"Kinesis is designed for continuous data streams, unlike batch processing. Differentiate between Kinesis Data Streams (custom processing), Kinesis Data Firehose (loading to data lakes), and Kinesis Data Analytics (real-time analytics). Understand the concept of shards and their impact on throughput."
📚 Certification: AWS Certified Solutions Architect - Associate (SAA-C03)
🔑 What are the Key Concepts of Amazon Kinesis?
- ▸ Kinesis Data Streams provides real-time data ingestion and processing with custom applications, offering flexibility but requiring more management.
- ▸ Kinesis Data Firehose is a fully managed service for reliably loading streaming data into data lakes, data stores, and analytics tools.
- ▸ Kinesis Data Analytics allows you to process and analyze streaming data using SQL or Apache Flink, enabling real-time insights.
- ▸ Shards are the unit of throughput in Kinesis Data Streams; more shards mean higher capacity, but also increased cost and complexity.
- ▸ Kinesis is ideal for use cases like application logs, website clickstreams, IoT sensor data, and real-time analytics dashboards.
🎯 How does Amazon Kinesis appear on the SAA-C03 Exam?
You may be asked to identify the best Kinesis service for ingesting website clickstream data and storing it in an S3 data lake for analysis by a data science team.
A scenario might describe a need to process real-time sensor data from IoT devices and trigger alerts based on predefined thresholds – determine the appropriate Kinesis service combination.
Expect questions about scaling Kinesis Data Streams by adjusting the number of shards to handle fluctuating data ingestion rates and maintain performance.
❓ Frequently Asked Questions
When would I choose Kinesis Data Streams over Kinesis Data Firehose?
Choose Streams when you need to perform custom processing on the data before storing it, like complex transformations or enrichment. Firehose is best for direct loading to destinations.
How do I monitor the performance of my Kinesis Data Streams?
Monitor metrics like `IncomingBytes`, `OutgoingBytes`, `WriteProvisionedThroughputExceeded`, and `ReadProvisionedThroughputExceeded` in CloudWatch to identify potential bottlenecks and scaling needs.
What happens if I exceed the number of shards in Kinesis Data Streams?
You'll experience throttling and data loss. It's crucial to proactively monitor shard utilization and scale up the number of shards before reaching capacity limits.