Skip to main content
AWS Fundamentals Tutorial
CHAPTER 17 Beginner

AWS CloudWatch Monitoring

Updated: May 15, 2026
20 min read

# CHAPTER 17

AWS CloudWatch Monitoring

1. Introduction

When you run a server in your bedroom, you can look at the physical box to see if the fans are spinning or if the hard drive light is blinking. When your server is in a data center thousands of miles away, you are blind. If an EC2 instance crashes, or a Lambda function throws a Python syntax error, how do you know? AWS provides Amazon CloudWatch—the central nervous system for monitoring, logging, and observing your entire cloud environment.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Define the core pillars of observability (Metrics and Logs).
  • Understand how CloudWatch Metrics track resource performance.
  • Create CloudWatch Alarms to trigger automated actions.
  • Access and query application logs in CloudWatch Logs.
  • Build a unified CloudWatch Dashboard.

3. Beginner-Friendly Explanation

Imagine running a massive factory.
  • Metrics: You install pressure gauges on every pipe. These gauges report a number every 60 seconds (e.g., "Pressure is at 50%"). This gives you a high-level view of the factory's health.
  • Alarms: You wire a siren to the gauge. If the pressure hits 90%, the siren screams, automatically shutting down the machine and sending you a text message.
  • Logs: A worker on the factory floor keeps a detailed journal, writing down every single action they take. "At 10:04 AM, I opened valve B. At 10:05 AM, the valve got stuck."

CloudWatch acts as the pressure gauges (Metrics), the sirens (Alarms), and the massive digital library holding every worker's journal (Logs).

4. CloudWatch Metrics

Every service in AWS automatically sends free metric data to CloudWatch every few minutes.
  • EC2: Sends CPU Utilization, Network Traffic IN/OUT, and Disk Read/Write speeds.
  • S3: Sends total Bucket Size (GBs) and number of objects.
  • API Gateway: Sends total API Requests and 400/500 Error Rates.
  • Billing: Sends your Estimated Monthly Charges!

*(Note: EC2 does NOT send RAM/Memory usage by default. Because AWS cannot see inside your virtual machine's operating system, you must install the CloudWatch Agent onto the EC2 instance to push RAM metrics to CloudWatch).*

5. CloudWatch Alarms

Metrics are useless if no one is looking at them. You use Alarms to automate responses. If you set an Alarm: *"IF EC2 CPU > 80% for 5 Minutes"*, you can configure it to:
  1. 1. Send an email to the IT Team via SNS (Simple Notification Service).
  1. 2. Trigger an Auto Scaling Group to launch more servers!
  1. 3. Automatically reboot the EC2 instance.

6. CloudWatch Logs

When you write print("Hello World") in a Python Lambda function, or console.log("Error") in Node.js, where does that text go? It goes directly into CloudWatch Logs. CloudWatch organizes logs into Log Groups (usually one per Lambda function or EC2 application) and Log Streams (individual log files). If your API crashes, you immediately go to CloudWatch Logs, open the Log Stream for that exact minute, and read the raw error message to debug your code.

7. Mini Project: Monitor EC2 Server Health

Let's create an alarm to notify us if a server crashes.

Step-by-Step Tutorial:

  1. 1. Open the AWS Console and search for CloudWatch.
  1. 2. On the left menu, click Alarms -> All alarms, then click Create alarm.
  1. 3. Click Select metric.
  1. 4. Choose EC2 -> Per-Instance Metrics.
  1. 5. Find your running EC2 instance and check the box for the CPUUtilization metric. Click Select metric.
  1. 6. Conditions: Set it to trigger if CPU is Greater/Equal to 80.
  1. 7. Click Next.
  1. 8. Notification: Under "Send a notification to the following SNS topic", select "Create new topic". Name it MyAlertTopic, enter your email address, and click "Create topic". (You must go to your email inbox and click the verification link AWS sends you!).
  1. 9. Click Next, name the alarm High-CPU-Alert, and click Create alarm.

*Result: If your EC2 server is ever overwhelmed by traffic, AWS will automatically email your phone!*

8. Best Practices

  • Custom Dashboards: Stop clicking through 10 different AWS services to check your health. Use CloudWatch Dashboards to create a single, unified TV screen holding graphs for your EC2 CPU, your Database connections, your Lambda errors, and your estimated monthly Bill.

9. Common Mistakes

  • Log Retention Costs: By default, CloudWatch Logs are kept Forever. If your application generates 50GB of logs a day, you will eventually pay massive storage fees for logs from 3 years ago that you will never read. Always go into your Log Groups and change the Retention Setting from "Never Expire" to "30 Days" or "60 Days".

10. Exercises

  1. 1. What is the fundamental difference between a CloudWatch Metric and a CloudWatch Log?
  1. 2. Why is an SNS (Simple Notification Service) Topic required when setting up a CloudWatch Alarm for email alerts?

11. MCQs with Answers

Question 1

A developer wants to monitor the exact amount of RAM (Memory) being utilized by their Linux EC2 instance. They open CloudWatch but cannot find a Memory metric. Why?

Question 2

Which CloudWatch feature allows you to automatically trigger an Auto Scaling Group to launch more servers when the average CPU utilization exceeds 75%?

12. Interview Questions

  • Q: Explain the architecture of centralized logging in AWS. How do serverless Lambda functions output their error messages, and how would a developer query those errors?
  • Q: Discuss the interplay between CloudWatch Alarms, SNS Topics, and Auto Scaling Groups in a highly elastic cloud architecture.

13. FAQs

Q: Is CloudWatch the only way to monitor AWS? A: No. Many enterprises use third-party tools like Datadog, New Relic, or Splunk. However, those tools usually integrate directly with CloudWatch under the hood to pull the raw metric data out of AWS!

14. Summary

In Chapter 17, we illuminated the invisible mechanics of our cloud architecture. We introduced Amazon CloudWatch as the premier observability tool. We differentiated between high-level Metrics (numerical performance data) and granular Logs (raw application output). We learned to automate incident response by configuring CloudWatch Alarms to monitor thresholds and trigger email alerts via SNS, ensuring we are always the first to know when our infrastructure requires attention.

15. Next Chapter Recommendation

CloudWatch monitors *performance* (e.g., "The server crashed"). But what service monitors *security and governance* (e.g., "Who deleted the database?"). Proceed to Chapter 18: AWS CloudTrail and Auditing.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·