Amazon Kinesis Data Streams
Difficulty: hard
Overview
Amazon Kinesis Data Streams is a real-time data streaming service.
Core Architecture:
- Shard: Unit of capacity. 1 MB/s write (1,000 PUT/s), 2 MB/s read. Scale by adding shards.
- Retention: 24 hours default, up to 365 days.
- Record: Sequence number + partition key + data blob (up to 1 MB).
Producers:
- Kinesis SDK: PutRecord, PutRecords.
- KPL (Kinesis Producer Library): Batching, compression, retry. Adds latency.
Consumers:
- Shared (Classic): 2 MB/s total per shard, divided among all consumers.
- Enhanced Fan-Out: Dedicated 2 MB/s per shard per registered consumer. Push model.
- Lambda: Event source mapping. Configure batchSize, bisectOnError.
- KCL: Manages checkpointing in DynamoDB. One worker per shard.
Partition Key: Determines which shard receives the record. Same key → same shard → ordered.
Kinesis vs SQS:
| Kinesis | SQS | |
|---|---|---|
| Retention | Up to 365 days | Up to 14 days |
| Replay | Yes | No |
| Multiple consumers | Yes (simultaneously) | One at a time |
Kinesis Firehose: Load to S3, Redshift, OpenSearch, Splunk. NOT real-time (60s minimum buffering). Fully managed.
Practice Linked Questions
Q1. A Kinesis Data Stream has 4 shards. A producer is writing 6 MB/s of data. Consumers are falling behind. What is the correct action?
Select one answer before revealing.
Q2. A Kinesis stream has multiple Lambda consumers. The developer notices that all consumers are reading the same data, and there is noticeable read latency because they share the 2 MB/s per shard read limit. What is the recommended solution?
Select one answer before revealing.
Q3. A developer uses a partition key of "productCategory" for Kinesis records. The stream has 10 shards but one shard is handling 90% of all traffic. What is the root cause and solution?
Select one answer before revealing.
Q4. A developer needs to archive all Kinesis stream records to S3 in Parquet format, partitioned by date, without writing any custom code. Which service should be used?
Select one answer before revealing.
Q5. A developer uses a Lambda function as a Kinesis consumer. The function processes a batch of 100 records but 3 records fail. With default settings, what happens?
Select one answer before revealing.