AWS CloudWatch Monitoring
# CHAPTER 17
AWS CloudWatch Monitoring
1. Introduction
When you run a server in your bedroom, you can look at the physical box to see if the fans are spinning or if the hard drive light is blinking. When your server is in a data center thousands of miles away, you are blind. If an EC2 instance crashes, or a Lambda function throws a Python syntax error, how do you know? AWS provides Amazon CloudWatch—the central nervous system for monitoring, logging, and observing your entire cloud environment.2. Learning Objectives
By the end of this chapter, you will be able to:- Define the core pillars of observability (Metrics and Logs).
- Understand how CloudWatch Metrics track resource performance.
- Create CloudWatch Alarms to trigger automated actions.
- Access and query application logs in CloudWatch Logs.
- Build a unified CloudWatch Dashboard.
3. Beginner-Friendly Explanation
Imagine running a massive factory.- Metrics: You install pressure gauges on every pipe. These gauges report a number every 60 seconds (e.g., "Pressure is at 50%"). This gives you a high-level view of the factory's health.
- Alarms: You wire a siren to the gauge. If the pressure hits 90%, the siren screams, automatically shutting down the machine and sending you a text message.
- Logs: A worker on the factory floor keeps a detailed journal, writing down every single action they take. "At 10:04 AM, I opened valve B. At 10:05 AM, the valve got stuck."
CloudWatch acts as the pressure gauges (Metrics), the sirens (Alarms), and the massive digital library holding every worker's journal (Logs).
4. CloudWatch Metrics
Every service in AWS automatically sends free metric data to CloudWatch every few minutes.- EC2: Sends CPU Utilization, Network Traffic IN/OUT, and Disk Read/Write speeds.
- S3: Sends total Bucket Size (GBs) and number of objects.
- API Gateway: Sends total API Requests and 400/500 Error Rates.
- Billing: Sends your Estimated Monthly Charges!
*(Note: EC2 does NOT send RAM/Memory usage by default. Because AWS cannot see inside your virtual machine's operating system, you must install the CloudWatch Agent onto the EC2 instance to push RAM metrics to CloudWatch).*
5. CloudWatch Alarms
Metrics are useless if no one is looking at them. You use Alarms to automate responses. If you set an Alarm: *"IF EC2 CPU > 80% for 5 Minutes"*, you can configure it to:- 1. Send an email to the IT Team via SNS (Simple Notification Service).
- 2. Trigger an Auto Scaling Group to launch more servers!
- 3. Automatically reboot the EC2 instance.
6. CloudWatch Logs
When you writeprint("Hello World") in a Python Lambda function, or console.log("Error") in Node.js, where does that text go? It goes directly into CloudWatch Logs.
CloudWatch organizes logs into Log Groups (usually one per Lambda function or EC2 application) and Log Streams (individual log files).
If your API crashes, you immediately go to CloudWatch Logs, open the Log Stream for that exact minute, and read the raw error message to debug your code.
7. Mini Project: Monitor EC2 Server Health
Let's create an alarm to notify us if a server crashes.Step-by-Step Tutorial:
- 1. Open the AWS Console and search for CloudWatch.
- 2. On the left menu, click Alarms -> All alarms, then click Create alarm.
- 3. Click Select metric.
- 4. Choose EC2 -> Per-Instance Metrics.
-
5.
Find your running EC2 instance and check the box for the
CPUUtilizationmetric. Click Select metric.
-
6.
Conditions: Set it to trigger if CPU is
Greater/Equalto80.
- 7. Click Next.
-
8.
Notification: Under "Send a notification to the following SNS topic", select "Create new topic". Name it
MyAlertTopic, enter your email address, and click "Create topic". (You must go to your email inbox and click the verification link AWS sends you!).
-
9.
Click Next, name the alarm
High-CPU-Alert, and click Create alarm.
*Result: If your EC2 server is ever overwhelmed by traffic, AWS will automatically email your phone!*
8. Best Practices
- Custom Dashboards: Stop clicking through 10 different AWS services to check your health. Use CloudWatch Dashboards to create a single, unified TV screen holding graphs for your EC2 CPU, your Database connections, your Lambda errors, and your estimated monthly Bill.
9. Common Mistakes
- Log Retention Costs: By default, CloudWatch Logs are kept Forever. If your application generates 50GB of logs a day, you will eventually pay massive storage fees for logs from 3 years ago that you will never read. Always go into your Log Groups and change the Retention Setting from "Never Expire" to "30 Days" or "60 Days".
10. Exercises
- 1. What is the fundamental difference between a CloudWatch Metric and a CloudWatch Log?
- 2. Why is an SNS (Simple Notification Service) Topic required when setting up a CloudWatch Alarm for email alerts?
11. MCQs with Answers
A developer wants to monitor the exact amount of RAM (Memory) being utilized by their Linux EC2 instance. They open CloudWatch but cannot find a Memory metric. Why?
Which CloudWatch feature allows you to automatically trigger an Auto Scaling Group to launch more servers when the average CPU utilization exceeds 75%?
12. Interview Questions
- Q: Explain the architecture of centralized logging in AWS. How do serverless Lambda functions output their error messages, and how would a developer query those errors?
- Q: Discuss the interplay between CloudWatch Alarms, SNS Topics, and Auto Scaling Groups in a highly elastic cloud architecture.