CHAPTER 12
Beginner
AWS Auto Scaling
Updated: May 15, 2026
25 min read
# CHAPTER 12
AWS Auto Scaling
1. Introduction
A Load Balancer perfectly distributes traffic across your existing servers. However, traffic on the internet is rarely consistent. An e-commerce site might have 100 users at 3 AM, but 100,000 users at 9 AM during a massive Black Friday sale. If you only have 3 servers, the 9 AM spike will crash them. If you permanently run 100 servers to survive the spike, you will waste thousands of dollars during the quiet nights. The magic of the cloud is Elasticity. In this chapter, we will master Amazon EC2 Auto Scaling, allowing our infrastructure to physically grow and shrink automatically based on real-time demand.2. Learning Objectives
By the end of this chapter, you will be able to:- Define the concept of Cloud Elasticity.
- Differentiate between Scaling Up (Vertical) and Scaling Out (Horizontal).
- Configure an Auto Scaling Group (ASG) with Min, Max, and Desired capacities.
- Understand Launch Templates.
- Design Dynamic Scaling Policies based on CloudWatch metrics (like CPU usage).
3. Beginner-Friendly Explanation
Imagine managing a call center. You have 5 employees on shift. Suddenly, a new commercial airs, and 500 people call at once. The 5 employees are overwhelmed, and customers hang up.- Without Auto Scaling: You panic, quickly call 10 off-duty employees, and wait 30 minutes for them to drive to work. By the time they arrive, the callers have already left.
- With Auto Scaling: You install a robot manager. You program a rule: *"If the hold time exceeds 5 minutes, instantly teleport 10 more employees into the room. When the hold time drops below 1 minute, send 10 employees home so we don't have to pay them."*
Auto Scaling is the robot manager for your EC2 instances.
4. Launch Templates
Before AWS can automatically launch a new server, it needs to know *what* to launch. You define this in a Launch Template. A Launch Template is a saved configuration file containing:- The exact AMI (Operating System / pre-installed software)
-
The Instance Type (e.g.,
t2.micro)
- The Security Group
- The SSH Key Pair
When the Auto Scaling Group needs more servers, it simply reads the Launch Template and clones it.
5. The Auto Scaling Group (ASG)
The Auto Scaling Group (ASG) is the logical container that manages your fleet of servers. You define strict capacity rules for the ASG:- Minimum Capacity: The absolute lowest number of servers running at any time (e.g., 2). *If a server crashes, the ASG sees the count drop to 1 and immediately launches a new one to replace it!*
- Maximum Capacity: A budget safeguard. The absolute maximum number of servers allowed to run during a massive spike (e.g., 10).
- Desired Capacity: The number of servers currently running right now (e.g., 2).
6. Scaling Policies (The Triggers)
How does the ASG know when to change the Desired Capacity from 2 to 10? You attach a Scaling Policy driven by CloudWatch metrics.- 1. Target Tracking Scaling (Most Common): You set a specific goal. *"Keep the average CPU utilization of all my servers at exactly 50%."* If traffic spikes and average CPU hits 80%, the ASG automatically launches more servers until the average drops back down to 50%.
- 2. Step Scaling: You define steps. *"If CPU > 70%, add 2 servers. If CPU > 90%, add 5 servers."*
- 3. Scheduled Scaling: Predictive scaling. *"Every Friday at 5:00 PM, increase minimum capacity to 20 servers to prepare for the weekend rush."*
7. Integrating ASG with Load Balancers
Auto Scaling and Load Balancers are best friends. When you configure an ASG, you attach it to the Target Group of an Application Load Balancer.- 1. The ASG launches a brand new EC2 instance.
- 2. The ASG automatically registers the new instance with the Load Balancer's Target Group.
- 3. The Load Balancer performs a Health Check on the new instance.
- 4. Once healthy, the Load Balancer instantly begins sending user traffic to the newly created server.
8. Mini Project: Create an Auto Scaling Architecture
Let's conceptualize building an elastic fleet.Step-by-Step Tutorial:
-
1.
Create Launch Template: Go to EC2 -> Launch Templates. Define an Amazon Linux 2023 AMI,
t2.micro, and your Web Security Group. Save it asWeb-Template.
-
2.
Create ASG: Go to EC2 -> Auto Scaling Groups. Click Create. Select
Web-Template.
-
3.
Network: Select your VPC and choose multiple Availability Zones (e.g.,
us-east-1aand1b).
- 4. Load Balancing: Choose "Attach to an existing load balancer" and select your ALB Target Group.
- 5. Group Size: Set Min = 2, Desired = 2, Max = 6.
-
6.
Scaling Policies: Select "Target tracking scaling policy". Metric type:
Average CPU utilization. Target value:60.
- 7. Create!
9. Best Practices
-
Bake Your AMIs (Golden Images): If your Auto Scaling Group launches a blank Linux server, and runs a script to download Node.js, clone your Git repository, and run
npm install, the server might take 5 minutes to boot. During a traffic spike, 5 minutes is too slow. Instead, build a fully configured server, take a snapshot to create a Custom "Golden" AMI, and use that in your Launch Template. The server will boot in seconds.
10. Common Mistakes
- Scaling on Stateful Applications: If your EC2 instances store user session data or uploaded photos on their local hard drives, Auto Scaling will destroy your application. When traffic drops, the ASG will terminate random instances to save money, permanently deleting the local photos! As discussed in Chapter 6, all uploaded files must go to S3, and all sessions to a database, making the EC2 instances truly Stateless and safe to terminate.
11. Exercises
- 1. Define the difference between the Minimum, Maximum, and Desired capacities of an Auto Scaling Group.
- 2. Why is it highly recommended to attach an Auto Scaling Group across multiple Availability Zones?
12. MCQs with Answers
Question 1
An application experiences highly predictable traffic spikes every Monday morning at 9:00 AM. Which type of Auto Scaling policy is the most efficient choice to ensure sufficient EC2 instances are running before the traffic arrives?
Question 2
When an Auto Scaling Group detects that average CPU utilization has dropped significantly and triggers a scale-in event, what action does it take?
13. Interview Questions
- Q: Explain the mechanical relationship between an Auto Scaling Group (ASG) and an Application Load Balancer (ALB). When the ASG provisions a new instance, how does that instance begin receiving public web traffic?
- Q: A developer complains that when their Auto Scaling Group scales out during a traffic spike, the new instances take 10 minutes to finish installing software dependencies, rendering them useless for sudden bursts. How would you architect a solution utilizing Custom AMIs to solve this?