Concept

Medium

Amazon SQS & Message Queuing

8 min read·SQSMessagingDecouplingDVA-C02

A comprehensive deep dive into Amazon SQS — Standard vs FIFO queues, visibility timeout, polling, DLQs, Lambda integration, access policies, and message queuing patterns for the DVA-C02 exam.

What is Amazon SQS?

Amazon SQS (Simple Queue Service) is a fully managed message queuing service that decouples the components of distributed systems. Producers send messages to a queue; consumers poll and process them independently — neither needs to know about the other.

Core mental model: SQS is a buffer. If your producer is faster than your consumer, messages accumulate safely in the queue instead of being lost. Consumers scale independently to drain the queue at their own pace.

When to use SQS:

Decouple a monolith into independently scalable services
Absorb traffic spikes (queue as shock absorber)
Ensure no messages are lost if a consumer crashes
Fan-out with SNS → multiple SQS queues

Standard vs FIFO Queues

Feature	Standard Queue	FIFO Queue
Throughput	Unlimited	300 TPS (3,000 with batching)
Message ordering	Best-effort (not guaranteed)	Strict FIFO per MessageGroupId
Delivery guarantee	At-least-once (duplicates possible)	Exactly-once (deduplication built-in)
Deduplication	❌	✅ 5-minute deduplication window
URL suffix	`.amazonaws.com/account/QueueName`	`.amazonaws.com/account/QueueName.fifo`
Use case	High-throughput, order not critical	Financial transactions, inventory updates

Exam tip: "Exactly-once" and "strict ordering" always point to FIFO. "Unlimited throughput" always points to Standard.

Rendering diagram…

Message Lifecycle

Understanding the exact lifecycle of a message is critical for the exam.

Rendering diagram…

Producer sends a message → message is available in the queue
Consumer calls ReceiveMessage → message becomes in-flight (invisible to other consumers) for the duration of the visibility timeout
Consumer processes successfully → calls DeleteMessage → message is gone
Consumer fails or crashes → visibility timeout expires → message becomes available again for redelivery
After maxReceiveCount failed deliveries → message moves to the DLQ

Key Configuration Settings

Setting	Range	Default	Notes
Visibility Timeout	0s – 12 hours	30 seconds	How long message is hidden after receive
Message Retention	1 min – 14 days	4 days	How long undelivered messages are kept
Max Message Size	1 byte – 256 KB	256 KB	Use S3 + pointer for larger payloads
Delay Seconds	0 – 900s	0	Delay before message becomes available
Receive Message Wait Time	0 – 20s	0	Long polling wait time
Max Receive Count	1 – 1,000	—	Deliveries before DLQ routing

Visibility Timeout Deep Dive

The visibility timeout is the most commonly misunderstood SQS concept.

javascript

1import { SQSClient, ReceiveMessageCommand, DeleteMessageCommand, ChangeMessageVisibilityCommand } from '@aws-sdk/client-sqs';
2
3const sqs = new SQSClient({ region: 'us-east-1' });
4const QUEUE_URL = process.env.QUEUE_URL;
5
6const { Messages } = await sqs.send(new ReceiveMessageCommand({
7  QueueUrl: QUEUE_URL,
8  MaxNumberOfMessages: 10,     // up to 10 per call
9  WaitTimeSeconds: 20,         // long polling — wait up to 20s for messages
10  VisibilityTimeout: 60,       // override queue default for this receive call
11}));
12
13for (const message of Messages ?? []) {
14  try {
15    // If processing will take longer than the visibility timeout,
16    // extend it before it expires to prevent duplicate processing
17    await sqs.send(new ChangeMessageVisibilityCommand({
18      QueueUrl: QUEUE_URL,
19      ReceiptHandle: message.ReceiptHandle,
20      VisibilityTimeout: 120,  // extend by another 2 minutes
21    }));
22
23    await processMessage(JSON.parse(message.Body));
24
25    // Only delete AFTER successful processing
26    await sqs.send(new DeleteMessageCommand({
27      QueueUrl: QUEUE_URL,
28      ReceiptHandle: message.ReceiptHandle,
29    }));
30  } catch (err) {
31    console.error('Processing failed — message will reappear after timeout:', err);
32    // Do NOT delete — let it become visible again for retry
33  }
34}

Common mistake: Deleting the message before processing is complete. If your code crashes after delete, the message is gone forever. Always delete after successful processing.

Short Polling vs Long Polling

Mode	Behaviour	Cost	Latency
Short Polling	Returns immediately, even if queue is empty	Higher (more API calls)	Low
Long Polling	Waits up to 20s for a message to arrive	Lower (fewer empty responses)	Slightly higher

Long polling is always preferred in production. Enable it by setting WaitTimeSeconds to 1–20 on ReceiveMessage, or set ReceiveMessageWaitTimeSeconds on the queue itself.

bash

1# Enable long polling at the queue level (applies to all consumers)
2aws sqs set-queue-attributes \
3  --queue-url https://sqs.us-east-1.amazonaws.com/123/MyQueue \
4  --attributes ReceiveMessageWaitTimeSeconds=20

Dead-Letter Queues (DLQ)

A DLQ is a separate queue that receives messages which could not be processed successfully after maxReceiveCount attempts.

bash

1# Create the DLQ first
2aws sqs create-queue --queue-name MyQueue-DLQ
3
4# Get DLQ ARN
5DLQ_ARN=$(aws sqs get-queue-attributes \
6  --queue-url https://sqs.us-east-1.amazonaws.com/123/MyQueue-DLQ \
7  --attribute-names QueueArn \
8  --query Attributes.QueueArn --output text)
9
10# Attach DLQ to source queue with maxReceiveCount = 3
11aws sqs set-queue-attributes \
12  --queue-url https://sqs.us-east-1.amazonaws.com/123/MyQueue \
13  --attributes '{
14    "RedrivePolicy": "{
15      \"deadLetterTargetArn\": \"'$DLQ_ARN'\",
16      \"maxReceiveCount\": \"3\"
17    }"
18  }'

DLQ rules:

Source queue and DLQ must be the same type (Standard → Standard DLQ, FIFO → FIFO DLQ)
DLQ must be in the same AWS account and region
Messages in DLQ retain their original MessageId and body
Set DLQ retention longer than the source queue so you have time to investigate

DLQ Redrive (Message Recovery)

After fixing the consumer bug, move messages from the DLQ back to the source queue:

bash

1aws sqs start-message-move-task \
2  --source-arn arn:aws:sqs:us-east-1:123:MyQueue-DLQ \
3  --destination-arn arn:aws:sqs:us-east-1:123:MyQueue

FIFO Queue Details

FIFO queues guarantee strict ordering and exactly-once processing within a message group.

javascript

1// Sending to a FIFO queue — two required attributes
2await sqs.send(new SendMessageCommand({
3  QueueUrl: 'https://sqs.us-east-1.amazonaws.com/123/Orders.fifo',
4  MessageBody: JSON.stringify({ orderId: 'ORD-001', action: 'PLACE' }),
5  MessageGroupId: 'customer-cust_A',        // all messages for cust_A are ordered
6  MessageDeduplicationId: 'ORD-001-PLACE',  // deduplication key (5-min window)
7}));

Attribute	Purpose
`MessageGroupId`	Groups related messages for strict ordering. Different groups process independently (parallel).
`MessageDeduplicationId`	Prevents duplicate processing within 5-minute window. Alternatively, enable content-based deduplication (SHA-256 hash of body).

Throughput limits:

300 TPS without batching
3,000 TPS with SendMessageBatch (up to 10 messages per batch)
300 distinct MessageGroupId values can be in-flight simultaneously

Delay Queues & Per-Message Delays

Delay Queue: Set DelaySeconds (0–900s) on the queue — all new messages are invisible for that duration before becoming available.

Per-message delay: Override at send time with DelaySeconds parameter. Only available for Standard queues (FIFO queues do not support per-message delays).

javascript

1// Send a message that won't be visible for 5 minutes
2await sqs.send(new SendMessageCommand({
3  QueueUrl: QUEUE_URL,
4  MessageBody: JSON.stringify({ reminder: 'follow-up email', userId: 'usr_01' }),
5  DelaySeconds: 300,
6}));

SQS with Lambda (Event Source Mapping)

Lambda can poll SQS automatically via an event source mapping — no polling code needed in your function.

Rendering diagram…

javascript

1// Lambda receives a batch of SQS messages
2exports.handler = async (event) => {
3  const failures = [];
4
5  for (const record of event.Records) {
6    try {
7      const body = JSON.parse(record.body);
8      await processOrder(body);
9    } catch (err) {
10      console.error(`Failed to process ${record.messageId}:`, err);
11      // Report this message as failed — others in batch succeed
12      failures.push({ itemIdentifier: record.messageId });
13    }
14  }
15
16  // Return failed message IDs — Lambda will NOT delete them
17  // Requires FunctionResponseTypes: ['ReportBatchItemFailures'] on the ESM
18  return { batchItemFailures: failures };
19};

Event source mapping settings:

Setting	Range	Notes
Batch size	1 – 10,000	Messages per Lambda invocation
Batch window	0 – 300s	Wait to fill batch before invoking
Max concurrency	2 – 1,000	Limit simultaneous Lambda invocations
ReportBatchItemFailures	on/off	Enable partial batch success

Without ReportBatchItemFailures: If any message fails, the entire batch is retried — including messages that already succeeded, causing duplicate processing.

Message Attributes & Metadata

Attach structured metadata to messages without changing the body:

javascript

1await sqs.send(new SendMessageCommand({
2  QueueUrl: QUEUE_URL,
3  MessageBody: JSON.stringify({ orderId: 'ORD-001' }),
4  MessageAttributes: {
5    EventType: {
6      DataType: 'String',
7      StringValue: 'ORDER_PLACED',
8    },
9    Priority: {
10      DataType: 'Number',
11      StringValue: '1',
12    },
13    TraceId: {
14      DataType: 'String',
15      StringValue: 'trace-abc-123',
16    },
17  },
18}));

Up to 10 message attributes per message. Attributes count toward the 256 KB size limit.

Access Control — Queue Policies

SQS queues use resource-based policies to control cross-account or cross-service access:

json

1{
2  "Version": "2012-10-17",
3  "Statement": [
4    {
5      "Effect": "Allow",
6      "Principal": { "Service": "sns.amazonaws.com" },
7      "Action": "sqs:SendMessage",
8      "Resource": "arn:aws:sqs:us-east-1:123:MyQueue",
9      "Condition": {
10        "ArnEquals": {
11          "aws:SourceArn": "arn:aws:sns:us-east-1:123:MyTopic"
12        }
13      }
14    }
15  ]
16}

DVA-C02 Quick Reference

Topic	Key Fact
Standard queue delivery	At-least-once (duplicates possible)
FIFO queue delivery	Exactly-once
Standard queue ordering	Best-effort (not guaranteed)
FIFO queue ordering	Strict per MessageGroupId
FIFO throughput	300 TPS (3,000 with batching)
Default visibility timeout	30 seconds
Max visibility timeout	12 hours
Default message retention	4 days
Max message retention	14 days
Max message size	256 KB
Max delay seconds	900 seconds (15 min)
Long polling max wait	20 seconds
DLQ same type required	Standard → Standard DLQ, FIFO → FIFO DLQ
Per-message delay	Standard only (not FIFO)
Lambda batch size	1 – 10,000
Partial batch failure	`ReportBatchItemFailures` + return `batchItemFailures`
FIFO deduplication window	5 minutes
Extend visibility in-flight	`ChangeMessageVisibility`
Recover DLQ messages	`StartMessageMoveTask` (redrive)

Practice Questions10

easy

Q1. A web application experiences traffic spikes that overwhelm its order-processing backend, causing dropped orders. The developer wants to decouple the front end from the backend so orders are buffered durably and processed asynchronously at the backend's own pace. Which service is designed for this?