/CloudWatch, X-Ray & Observability
Concept Detail

CloudWatch, X-Ray & Observability

Difficulty: hard

Overview


CloudWatch Metrics: Custom metrics via PutMetricData. High-resolution: 1-second (StorageResolution=1). EMF: emit metrics from Lambda logs automatically.

CloudWatch Alarms: States: OK / ALARM / INSUFFICIENT_DATA. Composite Alarms: AND/OR of multiple alarms.

CloudWatch Logs:

  • Log Groups → Log Streams → Log Events.
  • Metric Filters: Extract metrics from log patterns.
  • Subscriptions: Stream logs to Lambda, Kinesis, Firehose in near real-time.
  • Logs Insights: SQL-like query language. fields @timestamp | filter @message like /ERROR/

AWS X-Ray:

Core Concepts:

  • Trace: End-to-end path across services.
  • Segment: Single service contribution.
  • Subsegment: Granular work within a segment (DB call, HTTP call).
  • Annotations: Indexed key-value pairs (max 50). Searchable.
  • Metadata: Non-indexed key-value pairs.
  • Service Map: Visual dependency graph.

Sampling: Default: first request/second + 5% of additional. Custom rules configurable.

Integration: Lambda (enable Active Tracing), EC2/ECS (install daemon), API Gateway (enable X-Ray on stage).

SDK: captureAWS() auto-instruments all AWS SDK calls. captureHTTP() for outbound HTTP.

CloudTrail vs CloudWatch vs X-Ray:

CloudTrailCloudWatchX-Ray
PurposeAPI auditMetrics/logsDistributed tracing

Practice Linked Questions


easy

Q1. A developer notices that a Lambda function's error rate is increasing but the default CloudWatch metrics do not show which specific code path is failing. Which AWS service provides request-level tracing to visualize the full call graph including downstream API calls?


Select one answer before revealing.

medium

Q2. A developer creates a custom CloudWatch metric to track the number of failed payment transactions per minute. The application should trigger an auto-scaling action when the metric exceeds 100 failures in 5 minutes. What should the developer create?


Select one answer before revealing.

medium

Q3. A developer uses X-Ray to trace a Lambda → DynamoDB call. The DynamoDB calls do not appear as subsegments in the trace. What is required to capture downstream DynamoDB calls?


Select one answer before revealing.

medium

Q4. A Lambda function logs thousands of lines per invocation. The developer wants to extract the count of "ERROR" occurrences per minute and display it on a CloudWatch dashboard. What is the most efficient approach?


Select one answer before revealing.

medium

Q5. A developer deploys a new Lambda version and immediately receives CloudWatch alarms on error rate. They want to instantly revert to the previous version with zero downtime. Which Lambda feature should the developer use?


Select one answer before revealing.