/AWS Step Functions & EventBridge
Concept
Hard

AWS Step Functions & EventBridge

10 min read·Step FunctionsEventBridgeOrchestrationDVA-C02

A comprehensive deep dive into AWS Step Functions and Amazon EventBridge — state machines, workflow types, all state types, error handling, callback patterns, event buses, rules, schema registry, Pipes, and Scheduler for the DVA-C02 exam.


Part 1 — AWS Step Functions

What is AWS Step Functions?

Step Functions is a fully managed serverless orchestration service that lets you coordinate multiple AWS services into visual workflows called state machines. Each step in the workflow is a state, and the transitions between states are driven by the output of previous states or explicit conditions.

Core mental model: Step Functions is the conductor of your microservices orchestra. Instead of each Lambda function calling the next one (tight coupling), Step Functions manages the sequence, error handling, retries, and parallelism — your functions stay simple and focused.

When to use Step Functions:

  • Multi-step workflows with conditional branching
  • Long-running processes (up to 1 year)
  • Human-in-the-loop approval flows
  • Parallel processing of independent tasks
  • Retry logic and error handling across services

Standard vs Express Workflows

FeatureStandard WorkflowExpress Workflow
Max duration1 year5 minutes
Execution modelExactly-onceAt-least-once
Execution historyFull audit in consoleCloudWatch Logs only
PricingPer state transitionPer execution + duration
Throughput2,000 executions/sec100,000 executions/sec
Synchronous execution✅ (Sync Express)
Use caseOrder processing, approvals, ETLIoT, streaming, high-volume event processing

Exam tip: "Exactly-once" + "audit trail" + "long-running" = Standard. "High throughput" + "short duration" + "IoT/streaming" = Express.


Amazon States Language (ASL)

State machines are defined in ASL — a JSON-based language.

json
1{
2  "Comment": "Order processing workflow",
3  "StartAt": "ValidateOrder",
4  "States": {
5    "ValidateOrder": {
6      "Type": "Task",
7      "Resource": "arn:aws:lambda:us-east-1:123:function:ValidateOrder",
8      "Next": "ProcessPayment",
9      "Catch": [{
10        "ErrorEquals": ["ValidationError"],
11        "Next": "RejectOrder"
12      }]
13    },
14    "ProcessPayment": {
15      "Type": "Task",
16      "Resource": "arn:aws:lambda:us-east-1:123:function:ProcessPayment",
17      "Next": "FulfillOrder",
18      "Retry": [{
19        "ErrorEquals": ["PaymentTimeout"],
20        "IntervalSeconds": 2,
21        "MaxAttempts": 3,
22        "BackoffRate": 2.0
23      }]
24    },
25    "FulfillOrder": {
26      "Type": "Task",
27      "Resource": "arn:aws:states:::sqs:sendMessage",
28      "Parameters": {
29        "QueueUrl": "https://sqs.us-east-1.amazonaws.com/123/fulfillment",
30        "MessageBody.$": "$"
31      },
32      "End": true
33    },
34    "RejectOrder": {
35      "Type": "Task",
36      "Resource": "arn:aws:lambda:us-east-1:123:function:SendRejectionEmail",
37      "End": true
38    }
39  }
40}

All State Types

Task State

Calls an external resource — Lambda, an AWS SDK service, an Activity, or an HTTP endpoint.

json
1{
2  "Type": "Task",
3  "Resource": "arn:aws:lambda:us-east-1:123:function:MyFunction",
4  "Parameters": {
5    "orderId.$": "$.orderId",
6    "staticValue": "hello"
7  },
8  "ResultPath": "$.lambdaResult",
9  "OutputPath": "$.lambdaResult",
10  "TimeoutSeconds": 30,
11  "HeartbeatSeconds": 10,
12  "Retry": [{ "ErrorEquals": ["States.Timeout"], "MaxAttempts": 2 }],
13  "Catch": [{ "ErrorEquals": ["States.ALL"], "Next": "HandleError" }],
14  "Next": "NextState"
15}

Optimistic integrations (SDK integrations): Step Functions can call 200+ AWS services directly — no Lambda needed:

json
1{
2  "Type": "Task",
3  "Resource": "arn:aws:states:::dynamodb:putItem",
4  "Parameters": {
5    "TableName": "Orders",
6    "Item": {
7      "orderId": { "S.$": "$.orderId" },
8      "status":  { "S": "PROCESSING" }
9    }
10  },
11  "Next": "Done"
12}

Choice State

Routes execution based on conditions — the if/else of state machines.

json
1{
2  "Type": "Choice",
3  "Choices": [
4    {
5      "Variable": "$.amount",
6      "NumericGreaterThan": 1000,
7      "Next": "RequireApproval"
8    },
9    {
10      "Variable": "$.tier",
11      "StringEquals": "premium",
12      "Next": "ExpressCheckout"
13    }
14  ],
15  "Default": "StandardCheckout"
16}

Parallel State

Executes multiple independent branches simultaneously. Waits for ALL branches to complete before moving on.

json
1{
2  "Type": "Parallel",
3  "Branches": [
4    {
5      "StartAt": "SendConfirmationEmail",
6      "States": {
7        "SendConfirmationEmail": { "Type": "Task", "Resource": "...", "End": true }
8      }
9    },
10    {
11      "StartAt": "UpdateInventory",
12      "States": {
13        "UpdateInventory": { "Type": "Task", "Resource": "...", "End": true }
14      }
15    }
16  ],
17  "Next": "OrderComplete"
18}

Map State

Iterates over an array in the input and processes each item — in parallel or sequentially.

json
1{
2  "Type": "Map",
3  "ItemsPath": "$.orderItems",
4  "MaxConcurrency": 5,
5  "Iterator": {
6    "StartAt": "ProcessItem",
7    "States": {
8      "ProcessItem": {
9        "Type": "Task",
10        "Resource": "arn:aws:lambda:us-east-1:123:function:ProcessItem",
11        "End": true
12      }
13    }
14  },
15  "Next": "AllItemsProcessed"
16}

MaxConcurrency: 0 = unlimited parallelism. MaxConcurrency: 1 = sequential processing.

Wait State

Pauses execution for a fixed duration or until a specific timestamp.

json
1{ "Type": "Wait", "Seconds": 30, "Next": "RetryPayment" }
2
3{ "Type": "Wait", "TimestampPath": "$.scheduledAt", "Next": "SendReminder" }

Other States

StatePurpose
PassPass input to output unchanged (or inject static data). Useful for testing.
SucceedTerminate execution successfully.
FailTerminate execution with an error and cause message.

Error Handling

Retry

Automatically retry a Task on specified errors with exponential backoff:

json
1"Retry": [
2  {
3    "ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException"],
4    "IntervalSeconds": 1,
5    "MaxAttempts": 3,
6    "BackoffRate": 2.0,
7    "JitterStrategy": "FULL"
8  },
9  {
10    "ErrorEquals": ["States.Timeout"],
11    "IntervalSeconds": 5,
12    "MaxAttempts": 2,
13    "BackoffRate": 1.5
14  }
15]

Retry delays: IntervalSeconds × BackoffRate^(attempt-1). With the example above: 1s → 2s → 4s.

Catch

When retries are exhausted (or no retry is configured), Catch redirects to an error-handling state:

json
1"Catch": [
2  {
3    "ErrorEquals": ["PaymentDeclinedException"],
4    "ResultPath": "$.errorInfo",
5    "Next": "NotifyCustomer"
6  },
7  {
8    "ErrorEquals": ["States.ALL"],
9    "ResultPath": "$.errorInfo",
10    "Next": "GlobalErrorHandler"
11  }
12]

Built-in error codes:

ErrorWhen it occurs
States.ALLMatches any error (catch-all)
States.TimeoutTask exceeded TimeoutSeconds
States.TaskFailedTask threw an unhandled exception
States.HeartbeatTimeoutNo heartbeat received within HeartbeatSeconds
States.NoChoiceMatchedNo Choice branch matched and no Default
States.ItemReaderFailedMap state item reader failed

Wait for Callback Pattern (.waitForTaskToken)

Pause a workflow and wait for an external system to send a callback — perfect for human approvals, third-party webhooks, or long-running jobs.

json
1{
2  "Type": "Task",
3  "Resource": "arn:aws:states:::sqs:sendMessage.waitForTaskToken",
4  "Parameters": {
5    "QueueUrl": "https://sqs.us-east-1.amazonaws.com/123/approval-queue",
6    "MessageBody": {
7      "taskToken.$": "$$.Task.Token",
8      "orderId.$": "$.orderId",
9      "approvalUrl": "https://app.example.com/approve"
10    }
11  },
12  "HeartbeatSeconds": 3600,
13  "Next": "OrderApproved"
14}
javascript
1// External system (human clicks "Approve" in UI) calls back:
2import { SFNClient, SendTaskSuccessCommand, SendTaskFailureCommand } from '@aws-sdk/client-sfn';
3
4const sfn = new SFNClient({ region: 'us-east-1' });
5
6// Approve — resumes the workflow
7await sfn.send(new SendTaskSuccessCommand({
8  taskToken: taskTokenFromMessage,
9  output: JSON.stringify({ approved: true, approvedBy: 'manager@example.com' }),
10}));
11
12// Reject — sends execution to Catch handler
13await sfn.send(new SendTaskFailureCommand({
14  taskToken: taskTokenFromMessage,
15  error: 'ApprovalDenied',
16  cause: 'Amount exceeds budget limit',
17}));

Step Functions Flow Diagram

Rendering diagram…

Part 2 — Amazon EventBridge

What is Amazon EventBridge?

EventBridge is a fully managed serverless event bus that routes events between AWS services, your own applications, and 200+ SaaS partners. It decouples producers from consumers — producers emit events without knowing who processes them.

Core mental model: EventBridge is an air traffic controller. Events land on the bus; rules decide which target each event is routed to. Producers and consumers never talk directly.


Event Buses

Bus TypeDescriptionUse case
DefaultReceives all AWS service events automaticallyCloudTrail, EC2, RDS events
CustomCreated by you for application eventsMicroservice decoupling
PartnerManaged by SaaS partners (Stripe, Datadog, Zendesk…)SaaS integrations
bash
1# Create a custom event bus
2aws events create-event-bus --name my-app-events
3
4# Send a custom event
5aws events put-events --entries '[{
6  "Source": "com.myapp.orders",
7  "DetailType": "OrderPlaced",
8  "Detail": "{\"orderId\":\"ORD-001\",\"amount\":99.99}",
9  "EventBusName": "my-app-events"
10}]'

Event Structure

Every EventBridge event follows this envelope:

json
1{
2  "version": "0",
3  "id": "abc-123",
4  "source": "com.myapp.orders",
5  "account": "123456789",
6  "time": "2024-01-15T10:30:00Z",
7  "region": "us-east-1",
8  "detail-type": "OrderPlaced",
9  "detail": {
10    "orderId": "ORD-001",
11    "customerId": "cust_A",
12    "amount": 99.99,
13    "tier": "premium"
14  }
15}

Rules and Event Patterns

Rules match incoming events using event patterns and route matching events to one or more targets.

json
1{
2  "source": ["com.myapp.orders"],
3  "detail-type": ["OrderPlaced", "OrderUpdated"],
4  "detail": {
5    "tier": ["premium"],
6    "amount": [{ "numeric": [">", 50] }]
7  }
8}

Pattern operators:

OperatorExampleMatches
Exact match["ORDER_PLACED"]Value equals string
Prefix[{"prefix": "ORDER"}]String starts with "ORDER"
Suffix[{"suffix": ".jpg"}]String ends with ".jpg"
Numeric range[{"numeric": [">=", 10, "<", 100]}]10 ≤ value < 100
Exists[{"exists": true}]Field is present
Anything-but[{"anything-but": "CANCELLED"}]Value is not "CANCELLED"
IP range (CIDR)[{"cidr": "10.0.0.0/24"}]IP in CIDR range
bash
1# Create a rule that triggers Lambda on premium orders
2aws events put-rule \
3  --name premium-order-rule \
4  --event-bus-name my-app-events \
5  --event-pattern '{
6    "source": ["com.myapp.orders"],
7    "detail-type": ["OrderPlaced"],
8    "detail": { "tier": ["premium"] }
9  }' \
10  --state ENABLED
11
12aws events put-targets \
13  --rule premium-order-rule \
14  --event-bus-name my-app-events \
15  --targets '[{
16    "Id": "process-premium-order",
17    "Arn": "arn:aws:lambda:us-east-1:123:function:ProcessPremiumOrder"
18  }]'

Up to 5 targets per rule. Each rule can route to Lambda, SQS, SNS, Step Functions, ECS tasks, API Gateway, Kinesis, CodePipeline, and more.


EventBridge Architecture

Rendering diagram…

Schema Registry

EventBridge can discover and store schemas for events on your bus. Auto-discovery inspects events and infers their JSON schema.

  • Download code bindings (TypeScript, Java, Python) for type-safe event handling
  • Versioned schemas — track schema evolution over time
  • Share schemas across teams via Schema Registry
bash
1# Enable auto-discovery on your custom bus
2aws schemas create-discoverer \
3  --source-arn arn:aws:events:us-east-1:123:event-bus/my-app-events \
4  --description "Auto-discover event schemas"

Archive and Replay

EventBridge can archive all events that match a pattern and replay them later — invaluable for debugging, testing new consumers, and disaster recovery.

bash
1# Archive all OrderPlaced events for 90 days
2aws events create-archive \
3  --archive-name order-events-archive \
4  --event-source-arn arn:aws:events:us-east-1:123:event-bus/my-app-events \
5  --event-pattern '{"detail-type": ["OrderPlaced"]}' \
6  --retention-days 90
7
8# Replay archived events (e.g., to populate a new service)
9aws events start-replay \
10  --replay-name backfill-analytics \
11  --event-source-arn arn:aws:events:us-east-1:123:archive/order-events-archive \
12  --event-start-time 2024-01-01T00:00:00Z \
13  --event-end-time 2024-01-31T23:59:59Z \
14  --destination '{
15    "Arn": "arn:aws:events:us-east-1:123:event-bus/analytics-bus"
16  }'

EventBridge Pipes

Pipes create point-to-point integrations between a source and a target with optional filtering and enrichment — without writing polling code.

Rendering diagram…

Common use case: DynamoDB Stream → filter for INSERT events → enrich with Lambda → publish to EventBridge bus — all without managing polling infrastructure.


EventBridge Scheduler

Schedule one-time or recurring tasks using cron or rate expressions — a replacement for CloudWatch Events scheduled rules with more features.

bash
1# Run a Lambda every day at 9am UTC
2aws scheduler create-schedule \
3  --name daily-report \
4  --schedule-expression "cron(0 9 * * ? *)" \
5  --schedule-expression-timezone "America/New_York" \
6  --target '{
7    "Arn": "arn:aws:lambda:us-east-1:123:function:GenerateReport",
8    "RoleArn": "arn:aws:iam::123:role/scheduler-role",
9    "Input": "{\"reportType\": \"daily\"}"
10  }' \
11  --flexible-time-window '{"Mode": "OFF"}'
FeatureCloudWatch EventsEventBridge Scheduler
One-time schedules
Time zone support
Flexible time window
Millions of schedules
Direct SDK target✅ (200+ APIs)

DVA-C02 Quick Reference

Step Functions:

TopicKey Fact
Standard max duration1 year
Express max duration5 minutes
Standard execution modelExactly-once
Express execution modelAt-least-once
Standard pricingPer state transition
Express pricingPer execution + duration
Max throughputExpress: 100,000 executions/sec
Human approval patternwaitForTaskToken
Iterate over arrayMap state
Run branches in parallelParallel state
Conditional routingChoice state
Catch-all errorStates.ALL
Retry delay formulaIntervalSeconds × BackoffRate^(attempt-1)
SDK integrations200+ AWS services, no Lambda needed

EventBridge:

TopicKey Fact
Default bus eventsAll AWS service events
Max targets per rule5
SaaS integrations200+ partner event sources
Pattern matchingSource, detail-type, any detail field
Archive retentionConfigurable, up to indefinite
Replay use caseBackfill new consumers, disaster recovery
Schema discoveryAuto-infers JSON schema from events
EventBridge vs SNSEB: rich filtering + schema + SaaS; SNS: simpler + message attributes only
Scheduler vs CloudWatchScheduler: one-time, time zones, millions of schedules

Practice Questions5

easy

Q1. A developer builds a Step Functions workflow where Step B depends on Step A completing successfully, and Steps C and D can run simultaneously after Step B. Which state type should be used for Steps C and D?


Select one answer before revealing.

hard

Q2. A Step Functions Express Workflow must call an external payment API that takes up to 10 minutes to respond. The developer wants the workflow to wait for the API callback without polling. Which pattern should be used?


Select one answer before revealing.

medium

Q3. A developer uses Amazon EventBridge to trigger a Lambda function every day at 9 AM UTC. After deploying, the Lambda is not being triggered. The EventBridge rule shows the rule is enabled. What should the developer check first?


Select one answer before revealing.

medium

Q4. A developer wants to capture all changes to AWS API calls (e.g., S3 PutObject, EC2 StartInstances) across the account and route them to an SQS queue for compliance auditing. Which source should be used with EventBridge?


Select one answer before revealing.

medium

Q5. A Step Functions state machine processes a high-volume stream of e-commerce transactions (10,000+ per second). The developer needs low-latency execution and can tolerate at-most-once guarantees (no exactly-once semantics needed). Which workflow type should be used?


Select one answer before revealing.