CloudWatch, X-Ray & Observability
Difficulty: hard
Overview
CloudWatch Metrics: Custom metrics via PutMetricData. High-resolution: 1-second (StorageResolution=1). EMF: emit metrics from Lambda logs automatically.
CloudWatch Alarms: States: OK / ALARM / INSUFFICIENT_DATA. Composite Alarms: AND/OR of multiple alarms.
CloudWatch Logs:
- Log Groups → Log Streams → Log Events.
- Metric Filters: Extract metrics from log patterns.
- Subscriptions: Stream logs to Lambda, Kinesis, Firehose in near real-time.
- Logs Insights: SQL-like query language.
fields @timestamp | filter @message like /ERROR/
AWS X-Ray:
Core Concepts:
- Trace: End-to-end path across services.
- Segment: Single service contribution.
- Subsegment: Granular work within a segment (DB call, HTTP call).
- Annotations: Indexed key-value pairs (max 50). Searchable.
- Metadata: Non-indexed key-value pairs.
- Service Map: Visual dependency graph.
Sampling: Default: first request/second + 5% of additional. Custom rules configurable.
Integration: Lambda (enable Active Tracing), EC2/ECS (install daemon), API Gateway (enable X-Ray on stage).
SDK: captureAWS() auto-instruments all AWS SDK calls. captureHTTP() for outbound HTTP.
CloudTrail vs CloudWatch vs X-Ray:
| CloudTrail | CloudWatch | X-Ray | |
|---|---|---|---|
| Purpose | API audit | Metrics/logs | Distributed tracing |
Practice Linked Questions
Q1. A developer notices that a Lambda function's error rate is increasing but the default CloudWatch metrics do not show which specific code path is failing. Which AWS service provides request-level tracing to visualize the full call graph including downstream API calls?
Select one answer before revealing.
Q2. A developer creates a custom CloudWatch metric to track the number of failed payment transactions per minute. The application should trigger an auto-scaling action when the metric exceeds 100 failures in 5 minutes. What should the developer create?
Select one answer before revealing.
Q3. A developer uses X-Ray to trace a Lambda → DynamoDB call. The DynamoDB calls do not appear as subsegments in the trace. What is required to capture downstream DynamoDB calls?
Select one answer before revealing.
Q4. A Lambda function logs thousands of lines per invocation. The developer wants to extract the count of "ERROR" occurrences per minute and display it on a CloudWatch dashboard. What is the most efficient approach?
Select one answer before revealing.
Q5. A developer deploys a new Lambda version and immediately receives CloudWatch alarms on error rate. They want to instantly revert to the previous version with zero downtime. Which Lambda feature should the developer use?
Select one answer before revealing.