AWS Step Functions & EventBridge
A comprehensive deep dive into AWS Step Functions and Amazon EventBridge — state machines, workflow types, all state types, error handling, callback patterns, event buses, rules, schema registry, Pipes, and Scheduler for the DVA-C02 exam.
Part 1 — AWS Step Functions
What is AWS Step Functions?
Step Functions is a fully managed serverless orchestration service that lets you coordinate multiple AWS services into visual workflows called state machines. Each step in the workflow is a state, and the transitions between states are driven by the output of previous states or explicit conditions.
Core mental model: Step Functions is the conductor of your microservices orchestra. Instead of each Lambda function calling the next one (tight coupling), Step Functions manages the sequence, error handling, retries, and parallelism — your functions stay simple and focused.
When to use Step Functions:
- Multi-step workflows with conditional branching
- Long-running processes (up to 1 year)
- Human-in-the-loop approval flows
- Parallel processing of independent tasks
- Retry logic and error handling across services
Standard vs Express Workflows
| Feature | Standard Workflow | Express Workflow |
|---|---|---|
| Max duration | 1 year | 5 minutes |
| Execution model | Exactly-once | At-least-once |
| Execution history | Full audit in console | CloudWatch Logs only |
| Pricing | Per state transition | Per execution + duration |
| Throughput | 2,000 executions/sec | 100,000 executions/sec |
| Synchronous execution | ❌ | ✅ (Sync Express) |
| Use case | Order processing, approvals, ETL | IoT, streaming, high-volume event processing |
Exam tip: "Exactly-once" + "audit trail" + "long-running" = Standard. "High throughput" + "short duration" + "IoT/streaming" = Express.
Amazon States Language (ASL)
State machines are defined in ASL — a JSON-based language.
1{
2 "Comment": "Order processing workflow",
3 "StartAt": "ValidateOrder",
4 "States": {
5 "ValidateOrder": {
6 "Type": "Task",
7 "Resource": "arn:aws:lambda:us-east-1:123:function:ValidateOrder",
8 "Next": "ProcessPayment",
9 "Catch": [{
10 "ErrorEquals": ["ValidationError"],
11 "Next": "RejectOrder"
12 }]
13 },
14 "ProcessPayment": {
15 "Type": "Task",
16 "Resource": "arn:aws:lambda:us-east-1:123:function:ProcessPayment",
17 "Next": "FulfillOrder",
18 "Retry": [{
19 "ErrorEquals": ["PaymentTimeout"],
20 "IntervalSeconds": 2,
21 "MaxAttempts": 3,
22 "BackoffRate": 2.0
23 }]
24 },
25 "FulfillOrder": {
26 "Type": "Task",
27 "Resource": "arn:aws:states:::sqs:sendMessage",
28 "Parameters": {
29 "QueueUrl": "https://sqs.us-east-1.amazonaws.com/123/fulfillment",
30 "MessageBody.$": "$"
31 },
32 "End": true
33 },
34 "RejectOrder": {
35 "Type": "Task",
36 "Resource": "arn:aws:lambda:us-east-1:123:function:SendRejectionEmail",
37 "End": true
38 }
39 }
40}All State Types
Task State
Calls an external resource — Lambda, an AWS SDK service, an Activity, or an HTTP endpoint.
1{
2 "Type": "Task",
3 "Resource": "arn:aws:lambda:us-east-1:123:function:MyFunction",
4 "Parameters": {
5 "orderId.$": "$.orderId",
6 "staticValue": "hello"
7 },
8 "ResultPath": "$.lambdaResult",
9 "OutputPath": "$.lambdaResult",
10 "TimeoutSeconds": 30,
11 "HeartbeatSeconds": 10,
12 "Retry": [{ "ErrorEquals": ["States.Timeout"], "MaxAttempts": 2 }],
13 "Catch": [{ "ErrorEquals": ["States.ALL"], "Next": "HandleError" }],
14 "Next": "NextState"
15}Optimistic integrations (SDK integrations): Step Functions can call 200+ AWS services directly — no Lambda needed:
1{
2 "Type": "Task",
3 "Resource": "arn:aws:states:::dynamodb:putItem",
4 "Parameters": {
5 "TableName": "Orders",
6 "Item": {
7 "orderId": { "S.$": "$.orderId" },
8 "status": { "S": "PROCESSING" }
9 }
10 },
11 "Next": "Done"
12}Choice State
Routes execution based on conditions — the if/else of state machines.
1{
2 "Type": "Choice",
3 "Choices": [
4 {
5 "Variable": "$.amount",
6 "NumericGreaterThan": 1000,
7 "Next": "RequireApproval"
8 },
9 {
10 "Variable": "$.tier",
11 "StringEquals": "premium",
12 "Next": "ExpressCheckout"
13 }
14 ],
15 "Default": "StandardCheckout"
16}Parallel State
Executes multiple independent branches simultaneously. Waits for ALL branches to complete before moving on.
1{
2 "Type": "Parallel",
3 "Branches": [
4 {
5 "StartAt": "SendConfirmationEmail",
6 "States": {
7 "SendConfirmationEmail": { "Type": "Task", "Resource": "...", "End": true }
8 }
9 },
10 {
11 "StartAt": "UpdateInventory",
12 "States": {
13 "UpdateInventory": { "Type": "Task", "Resource": "...", "End": true }
14 }
15 }
16 ],
17 "Next": "OrderComplete"
18}Map State
Iterates over an array in the input and processes each item — in parallel or sequentially.
1{
2 "Type": "Map",
3 "ItemsPath": "$.orderItems",
4 "MaxConcurrency": 5,
5 "Iterator": {
6 "StartAt": "ProcessItem",
7 "States": {
8 "ProcessItem": {
9 "Type": "Task",
10 "Resource": "arn:aws:lambda:us-east-1:123:function:ProcessItem",
11 "End": true
12 }
13 }
14 },
15 "Next": "AllItemsProcessed"
16}
MaxConcurrency: 0= unlimited parallelism.MaxConcurrency: 1= sequential processing.
Wait State
Pauses execution for a fixed duration or until a specific timestamp.
1{ "Type": "Wait", "Seconds": 30, "Next": "RetryPayment" }
2
3{ "Type": "Wait", "TimestampPath": "$.scheduledAt", "Next": "SendReminder" }Other States
| State | Purpose |
|---|---|
Pass | Pass input to output unchanged (or inject static data). Useful for testing. |
Succeed | Terminate execution successfully. |
Fail | Terminate execution with an error and cause message. |
Error Handling
Retry
Automatically retry a Task on specified errors with exponential backoff:
1"Retry": [
2 {
3 "ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException"],
4 "IntervalSeconds": 1,
5 "MaxAttempts": 3,
6 "BackoffRate": 2.0,
7 "JitterStrategy": "FULL"
8 },
9 {
10 "ErrorEquals": ["States.Timeout"],
11 "IntervalSeconds": 5,
12 "MaxAttempts": 2,
13 "BackoffRate": 1.5
14 }
15]Retry delays: IntervalSeconds × BackoffRate^(attempt-1). With the example above: 1s → 2s → 4s.
Catch
When retries are exhausted (or no retry is configured), Catch redirects to an error-handling state:
1"Catch": [
2 {
3 "ErrorEquals": ["PaymentDeclinedException"],
4 "ResultPath": "$.errorInfo",
5 "Next": "NotifyCustomer"
6 },
7 {
8 "ErrorEquals": ["States.ALL"],
9 "ResultPath": "$.errorInfo",
10 "Next": "GlobalErrorHandler"
11 }
12]Built-in error codes:
| Error | When it occurs |
|---|---|
States.ALL | Matches any error (catch-all) |
States.Timeout | Task exceeded TimeoutSeconds |
States.TaskFailed | Task threw an unhandled exception |
States.HeartbeatTimeout | No heartbeat received within HeartbeatSeconds |
States.NoChoiceMatched | No Choice branch matched and no Default |
States.ItemReaderFailed | Map state item reader failed |
Wait for Callback Pattern (.waitForTaskToken)
Pause a workflow and wait for an external system to send a callback — perfect for human approvals, third-party webhooks, or long-running jobs.
1{
2 "Type": "Task",
3 "Resource": "arn:aws:states:::sqs:sendMessage.waitForTaskToken",
4 "Parameters": {
5 "QueueUrl": "https://sqs.us-east-1.amazonaws.com/123/approval-queue",
6 "MessageBody": {
7 "taskToken.$": "$$.Task.Token",
8 "orderId.$": "$.orderId",
9 "approvalUrl": "https://app.example.com/approve"
10 }
11 },
12 "HeartbeatSeconds": 3600,
13 "Next": "OrderApproved"
14}1// External system (human clicks "Approve" in UI) calls back:
2import { SFNClient, SendTaskSuccessCommand, SendTaskFailureCommand } from '@aws-sdk/client-sfn';
3
4const sfn = new SFNClient({ region: 'us-east-1' });
5
6// Approve — resumes the workflow
7await sfn.send(new SendTaskSuccessCommand({
8 taskToken: taskTokenFromMessage,
9 output: JSON.stringify({ approved: true, approvedBy: 'manager@example.com' }),
10}));
11
12// Reject — sends execution to Catch handler
13await sfn.send(new SendTaskFailureCommand({
14 taskToken: taskTokenFromMessage,
15 error: 'ApprovalDenied',
16 cause: 'Amount exceeds budget limit',
17}));Step Functions Flow Diagram
Part 2 — Amazon EventBridge
What is Amazon EventBridge?
EventBridge is a fully managed serverless event bus that routes events between AWS services, your own applications, and 200+ SaaS partners. It decouples producers from consumers — producers emit events without knowing who processes them.
Core mental model: EventBridge is an air traffic controller. Events land on the bus; rules decide which target each event is routed to. Producers and consumers never talk directly.
Event Buses
| Bus Type | Description | Use case |
|---|---|---|
| Default | Receives all AWS service events automatically | CloudTrail, EC2, RDS events |
| Custom | Created by you for application events | Microservice decoupling |
| Partner | Managed by SaaS partners (Stripe, Datadog, Zendesk…) | SaaS integrations |
1# Create a custom event bus
2aws events create-event-bus --name my-app-events
3
4# Send a custom event
5aws events put-events --entries '[{
6 "Source": "com.myapp.orders",
7 "DetailType": "OrderPlaced",
8 "Detail": "{\"orderId\":\"ORD-001\",\"amount\":99.99}",
9 "EventBusName": "my-app-events"
10}]'Event Structure
Every EventBridge event follows this envelope:
1{
2 "version": "0",
3 "id": "abc-123",
4 "source": "com.myapp.orders",
5 "account": "123456789",
6 "time": "2024-01-15T10:30:00Z",
7 "region": "us-east-1",
8 "detail-type": "OrderPlaced",
9 "detail": {
10 "orderId": "ORD-001",
11 "customerId": "cust_A",
12 "amount": 99.99,
13 "tier": "premium"
14 }
15}Rules and Event Patterns
Rules match incoming events using event patterns and route matching events to one or more targets.
1{
2 "source": ["com.myapp.orders"],
3 "detail-type": ["OrderPlaced", "OrderUpdated"],
4 "detail": {
5 "tier": ["premium"],
6 "amount": [{ "numeric": [">", 50] }]
7 }
8}Pattern operators:
| Operator | Example | Matches |
|---|---|---|
| Exact match | ["ORDER_PLACED"] | Value equals string |
| Prefix | [{"prefix": "ORDER"}] | String starts with "ORDER" |
| Suffix | [{"suffix": ".jpg"}] | String ends with ".jpg" |
| Numeric range | [{"numeric": [">=", 10, "<", 100]}] | 10 ≤ value < 100 |
| Exists | [{"exists": true}] | Field is present |
| Anything-but | [{"anything-but": "CANCELLED"}] | Value is not "CANCELLED" |
| IP range (CIDR) | [{"cidr": "10.0.0.0/24"}] | IP in CIDR range |
1# Create a rule that triggers Lambda on premium orders
2aws events put-rule \
3 --name premium-order-rule \
4 --event-bus-name my-app-events \
5 --event-pattern '{
6 "source": ["com.myapp.orders"],
7 "detail-type": ["OrderPlaced"],
8 "detail": { "tier": ["premium"] }
9 }' \
10 --state ENABLED
11
12aws events put-targets \
13 --rule premium-order-rule \
14 --event-bus-name my-app-events \
15 --targets '[{
16 "Id": "process-premium-order",
17 "Arn": "arn:aws:lambda:us-east-1:123:function:ProcessPremiumOrder"
18 }]'Up to 5 targets per rule. Each rule can route to Lambda, SQS, SNS, Step Functions, ECS tasks, API Gateway, Kinesis, CodePipeline, and more.
EventBridge Architecture
Schema Registry
EventBridge can discover and store schemas for events on your bus. Auto-discovery inspects events and infers their JSON schema.
- Download code bindings (TypeScript, Java, Python) for type-safe event handling
- Versioned schemas — track schema evolution over time
- Share schemas across teams via Schema Registry
1# Enable auto-discovery on your custom bus
2aws schemas create-discoverer \
3 --source-arn arn:aws:events:us-east-1:123:event-bus/my-app-events \
4 --description "Auto-discover event schemas"Archive and Replay
EventBridge can archive all events that match a pattern and replay them later — invaluable for debugging, testing new consumers, and disaster recovery.
1# Archive all OrderPlaced events for 90 days
2aws events create-archive \
3 --archive-name order-events-archive \
4 --event-source-arn arn:aws:events:us-east-1:123:event-bus/my-app-events \
5 --event-pattern '{"detail-type": ["OrderPlaced"]}' \
6 --retention-days 90
7
8# Replay archived events (e.g., to populate a new service)
9aws events start-replay \
10 --replay-name backfill-analytics \
11 --event-source-arn arn:aws:events:us-east-1:123:archive/order-events-archive \
12 --event-start-time 2024-01-01T00:00:00Z \
13 --event-end-time 2024-01-31T23:59:59Z \
14 --destination '{
15 "Arn": "arn:aws:events:us-east-1:123:event-bus/analytics-bus"
16 }'EventBridge Pipes
Pipes create point-to-point integrations between a source and a target with optional filtering and enrichment — without writing polling code.
Common use case: DynamoDB Stream → filter for INSERT events → enrich with Lambda → publish to EventBridge bus — all without managing polling infrastructure.
EventBridge Scheduler
Schedule one-time or recurring tasks using cron or rate expressions — a replacement for CloudWatch Events scheduled rules with more features.
1# Run a Lambda every day at 9am UTC
2aws scheduler create-schedule \
3 --name daily-report \
4 --schedule-expression "cron(0 9 * * ? *)" \
5 --schedule-expression-timezone "America/New_York" \
6 --target '{
7 "Arn": "arn:aws:lambda:us-east-1:123:function:GenerateReport",
8 "RoleArn": "arn:aws:iam::123:role/scheduler-role",
9 "Input": "{\"reportType\": \"daily\"}"
10 }' \
11 --flexible-time-window '{"Mode": "OFF"}'| Feature | CloudWatch Events | EventBridge Scheduler |
|---|---|---|
| One-time schedules | ❌ | ✅ |
| Time zone support | ❌ | ✅ |
| Flexible time window | ❌ | ✅ |
| Millions of schedules | ❌ | ✅ |
| Direct SDK target | ❌ | ✅ (200+ APIs) |
DVA-C02 Quick Reference
Step Functions:
| Topic | Key Fact |
|---|---|
| Standard max duration | 1 year |
| Express max duration | 5 minutes |
| Standard execution model | Exactly-once |
| Express execution model | At-least-once |
| Standard pricing | Per state transition |
| Express pricing | Per execution + duration |
| Max throughput | Express: 100,000 executions/sec |
| Human approval pattern | waitForTaskToken |
| Iterate over array | Map state |
| Run branches in parallel | Parallel state |
| Conditional routing | Choice state |
| Catch-all error | States.ALL |
| Retry delay formula | IntervalSeconds × BackoffRate^(attempt-1) |
| SDK integrations | 200+ AWS services, no Lambda needed |
EventBridge:
| Topic | Key Fact |
|---|---|
| Default bus events | All AWS service events |
| Max targets per rule | 5 |
| SaaS integrations | 200+ partner event sources |
| Pattern matching | Source, detail-type, any detail field |
| Archive retention | Configurable, up to indefinite |
| Replay use case | Backfill new consumers, disaster recovery |
| Schema discovery | Auto-infers JSON schema from events |
| EventBridge vs SNS | EB: rich filtering + schema + SaaS; SNS: simpler + message attributes only |
| Scheduler vs CloudWatch | Scheduler: one-time, time zones, millions of schedules |
Practice Questions5
Q1. A developer builds a Step Functions workflow where Step B depends on Step A completing successfully, and Steps C and D can run simultaneously after Step B. Which state type should be used for Steps C and D?
Select one answer before revealing.
Q2. A Step Functions Express Workflow must call an external payment API that takes up to 10 minutes to respond. The developer wants the workflow to wait for the API callback without polling. Which pattern should be used?
Select one answer before revealing.
Q3. A developer uses Amazon EventBridge to trigger a Lambda function every day at 9 AM UTC. After deploying, the Lambda is not being triggered. The EventBridge rule shows the rule is enabled. What should the developer check first?
Select one answer before revealing.
Q4. A developer wants to capture all changes to AWS API calls (e.g., S3 PutObject, EC2 StartInstances) across the account and route them to an SQS queue for compliance auditing. Which source should be used with EventBridge?
Select one answer before revealing.
Q5. A Step Functions state machine processes a high-volume stream of e-commerce transactions (10,000+ per second). The developer needs low-latency execution and can tolerate at-most-once guarantees (no exactly-once semantics needed). Which workflow type should be used?
Select one answer before revealing.