AWS CloudWatch

AWS CloudWatch is Amazon’s native monitoring and observability service that collects metrics, logs, and events from over 70 AWS services in real time. It enables engineering teams to set alarms, build dashboards, trace application performance, and automate responses to changes in their AWS infrastructure, all without managing any additional monitoring servers.

How AWS CloudWatch Works

CloudWatch operates as a centralized data collection and analysis layer sitting between your AWS resources and your team. Every AWS service, like EC2 instances, RDS databases, Lambda functions, S3 buckets, and more, continuously emits data points to CloudWatch. These come in three forms:

 

  • Metrics: Numerical time-series data (e.g., CPU utilization, request latency, error rates). Basic metrics are collected every 5 minutes by default; detailed monitoring reduces this to 1-minute intervals.
  • Logs: Text output from applications, Lambda functions, VPC flow logs, Route 53 queries, and more. CloudWatch Logs Insights lets you run SQL-like queries across log groups.
  • Events: State changes in your environment (e.g., an EC2 instance stopping, an Auto Scaling action triggering). EventBridge has largely absorbed this functionality.

 

Once data is in CloudWatch, you can visualize it on dashboards, configure alarms that trigger SNS notifications or Lambda functions, and use Contributor Insights to identify which resources are creating the most load.

Core Features

Metrics & Dashboards

CloudWatch automatically collects performance metrics from all major AWS services. You can create custom dashboards that combine metrics across services and accounts, giving you a unified view of your infrastructure health. Custom metrics, from your own applications or on-premises servers can also be pushed via the CloudWatch API or the CloudWatch Agent.

Data retention varies. For instance, metrics at 1-second resolution are kept for 3 hours; 1-minute metrics for 15 days; 5-minute metrics for 63 days; 1-hour metrics for 15 months.

CloudWatch Alarms

Alarms watch a single metric and execute one or more actions when a threshold is breached for a defined number of evaluation periods. Common uses include alerting on high CPU, triggering Auto Scaling, or stopping an idle EC2 instance. Alarms have three states: OK, ALARM, and INSUFFICIENT_DATA.

CloudWatch Logs

Log groups store log streams from your applications and AWS services. CloudWatch Logs Insights lets you interactively query and analyze log data useful for root cause analysis, security auditing, and performance troubleshooting. Subscription filters can route log data in real time to Lambda, Kinesis, or OpenSearch.

Container & Application Insights

Container Insights collects metrics and logs from ECS, EKS, and Kubernetes clusters. Application Insights automatically detects application components and their dependencies, reducing the time to identify anomalies.

CloudWatch vs. CloudTrail

This is one of the most common points of confusion in AWS. The short answer: CloudWatch monitors performance, CloudTrail monitors activity.

Dimension AWS CloudWatch AWS CloudTrail
Primary purpose Operational monitoring & observability Governance, compliance & audit logging
What it tracks Resource metrics, logs, application performance API calls, user activity, account events
Typical question “Is my EC2 running hot?” “Who deleted that S3 bucket?”
Data type Time-series metrics, log streams Structured event records (JSON)
Retention Configurable (up to 15 months) Indefinite (stored in S3)
Primary users DevOps, SREs, developers Security, compliance, auditors

Most teams use both. CloudTrail tells you what happened; CloudWatch tells you how things are performing right now. They can be used together, like you can configure CloudWatch alarms based on CloudTrail events, for example to alert on root account logins.

AWS CloudWatch Pricing

CloudWatch pricing is usage-based, but the costs add up quickly in production environments with high cardinality metrics or large log volumes. Here’s a simplified breakdown of the main cost drivers:

 

  • Custom Metrics: $0.30 Per metric/month (first 10,000). High-resolution charged at higher rate.
  • Log Ingestion: $0.50 Per GB ingested. Storage $0.03/GB/month. Insights queries $0.005/GB scanned.
  • Dashboards: $3.00 Per dashboard/month after first 3 free dashboards (up to 50 metrics each).

 

The free tier includes basic metrics at 5-minute frequency, 10 custom metrics, 5GB of log ingestion, and 3 dashboards. For most production workloads, teams quickly exceed the free tier, particularly on log ingestion, which is often the largest CloudWatch line item.

CloudWatch Limitations: What Teams Commonly Miss

CloudWatch is powerful within its scope, but there are several gaps that engineering and FinOps teams frequently run into:

No cross-cloud visibility

CloudWatch only covers AWS. If your infrastructure spans AWS, GCP, and Azure, increasingly common, you’ll need a third-party tool like Datadog, Grafana, or Prometheus to get a unified view.

Metric resolution limits

Standard metrics update every 5 minutes. For latency-sensitive applications, this granularity may be insufficient to catch transient spikes before they cause user-facing issues. Detailed monitoring (1-minute) or high-resolution metrics (1-second) cost extra.

Log querying at scale is slow and expensive

CloudWatch Logs Insights is useful for ad hoc queries but becomes expensive and slow on large log volumes. Many teams find themselves routing logs to OpenSearch or a dedicated log management platform for anything beyond basic troubleshooting.

Cost recommendations are delayed

AWS Compute Optimizer and Cost Explorer use CloudWatch utilization data to generate rightsizing and commitment recommendations, but those recommendations are based on data that’s typically 72+ hours old. By the time a recommendation surfaces, your usage patterns may have already shifted. This lag means teams either act on stale data or delay optimization decisions entirely.

Common Questions

1. Is AWS CloudWatch free?

CloudWatch has a free tier that includes basic metrics at 5-minute resolution, 10 custom metrics, 5GB of log ingestion per month, and 3 dashboards. Most production workloads exceed these limits, particularly on log ingestion and custom metrics.

 

2. What are Amazon CloudWatch logs?

CloudWatch Logs is the log management component of CloudWatch. It ingests log streams from AWS services (Lambda, ECS, API Gateway, VPC flow logs, etc.) and your own applications. Logs are organized into log groups and can be queried with CloudWatch Logs Insights or routed to other services via subscription filters.

 

3. How does CloudWatch monitoring work?

AWS services automatically publish metrics to CloudWatch at regular intervals (default: 5 minutes for most services). You can then create alarms on those metrics, visualize them on dashboards, or use them to trigger automated actions via Auto Scaling policies or Lambda functions.

 

4. What’s the difference between CloudWatch and CloudTrail?

CloudWatch monitors resource performance and application health (metrics, logs, alarms). CloudTrail records API activity and user actions for governance and compliance auditing. Both are complementary, CloudWatch tells you how your infrastructure is performing, CloudTrail tells you who did what.

 

5. Can CloudWatch reduce my AWS costs?

CloudWatch gives you the visibility to identify underutilized resources, which is the first step toward cost optimization. However, acting on that visibility (purchasing the right Savings Plans, Reserved Instances, or rightsizing commitments) is a separate process. Many teams use CloudWatch data alongside dedicated cost optimization platforms to translate visibility into actual savings.