/Amazon Kinesis Data Streams
Concept Detail

Amazon Kinesis Data Streams

Difficulty: hard

Overview


Amazon Kinesis Data Streams is a real-time data streaming service.

Core Architecture:

  • Shard: Unit of capacity. 1 MB/s write (1,000 PUT/s), 2 MB/s read. Scale by adding shards.
  • Retention: 24 hours default, up to 365 days.
  • Record: Sequence number + partition key + data blob (up to 1 MB).

Producers:

  • Kinesis SDK: PutRecord, PutRecords.
  • KPL (Kinesis Producer Library): Batching, compression, retry. Adds latency.

Consumers:

  • Shared (Classic): 2 MB/s total per shard, divided among all consumers.
  • Enhanced Fan-Out: Dedicated 2 MB/s per shard per registered consumer. Push model.
  • Lambda: Event source mapping. Configure batchSize, bisectOnError.
  • KCL: Manages checkpointing in DynamoDB. One worker per shard.

Partition Key: Determines which shard receives the record. Same key → same shard → ordered.

Kinesis vs SQS:

KinesisSQS
RetentionUp to 365 daysUp to 14 days
ReplayYesNo
Multiple consumersYes (simultaneously)One at a time

Kinesis Firehose: Load to S3, Redshift, OpenSearch, Splunk. NOT real-time (60s minimum buffering). Fully managed.

Practice Linked Questions


medium

Q1. A Kinesis Data Stream has 4 shards. A producer is writing 6 MB/s of data. Consumers are falling behind. What is the correct action?


Select one answer before revealing.

hard

Q2. A Kinesis stream has multiple Lambda consumers. The developer notices that all consumers are reading the same data, and there is noticeable read latency because they share the 2 MB/s per shard read limit. What is the recommended solution?


Select one answer before revealing.

hard

Q3. A developer uses a partition key of "productCategory" for Kinesis records. The stream has 10 shards but one shard is handling 90% of all traffic. What is the root cause and solution?


Select one answer before revealing.

easy

Q4. A developer needs to archive all Kinesis stream records to S3 in Parquet format, partitioned by date, without writing any custom code. Which service should be used?


Select one answer before revealing.

medium

Q5. A developer uses a Lambda function as a Kinesis consumer. The function processes a batch of 100 records but 3 records fail. With default settings, what happens?


Select one answer before revealing.