Skip to main content
System Design Interview
CHAPTER 02 Beginner

Understanding Scalability Fundamentals

Updated: May 18, 2026
5 min read

# CHAPTER 2

Understanding Scalability Fundamentals

1. Chapter Introduction

Scalability is the core theme of every System Design interview. It is the ability of a system to handle a growing amount of work by adding resources to the system. When a startup goes from 100 users to 10 million users overnight, a poorly scaled system will crash, losing data and revenue. This chapter defines the fundamental vocabulary of scale, comparing the two primary methods of scaling (Vertical vs. Horizontal), and dissecting critical performance metrics like Throughput and Latency.

2. Vertical Scaling (Scale-Up)

Vertical scaling means adding more power (CPU, RAM, Storage) to your existing machine. *Analogy:* You have a Honda Civic, but you need to carry more weight. Vertical scaling is putting a massive V8 truck engine into the Honda Civic.

Pros of Vertical Scaling:

  • Extremely easy to implement (usually just clicking a button in AWS to upgrade an EC2 instance).
  • Requires zero architectural changes or complex code refactoring.
  • No network overhead or distributed data consistency issues.

Cons of Vertical Scaling:

  • Hardware Limits: There is a hard limit to how much RAM or CPU one machine can hold.
  • Single Point of Failure (SPOF): If that one massive server crashes, your entire application goes offline.
  • Cost: High-end enterprise servers become exponentially more expensive as you add resources.

3. Horizontal Scaling (Scale-Out)

Horizontal scaling means adding more machines (nodes) to your pool of resources. *Analogy:* You need to carry more weight, so instead of modifying your Honda Civic, you buy 10 more Honda Civics and distribute the load among them.

Pros of Horizontal Scaling:

  • Infinite Scalability: You can keep adding cheap commodity servers forever.
  • High Availability: If one server crashes, the load balancer simply redirects traffic to the remaining healthy servers. No single point of failure.

Cons of Horizontal Scaling:

  • High Complexity: You now have to manage a distributed system.
  • Requires Load Balancers, complex deployment strategies (Kubernetes), and stateless application code.
  • Data inconsistency (If Server A updates data, Server B needs to know about it instantly).

*Interview Rule:* In modern distributed systems, Horizontal Scaling is the correct answer 99% of the time.

4. Bottlenecks

A bottleneck is a point of congestion in a system that slows down the overall performance. A system is only as fast as its slowest component. *Common Bottlenecks:*
  • CPU Bound: Complex computations (e.g., video rendering, machine learning) max out processor limits.
  • Memory Bound: The server runs out of RAM, forcing the system to read/write to the much slower hard drive (swapping).
  • Network Bound: The bandwidth limit of the network cable is reached.
  • Database Bound: The database cannot handle the volume of read/write queries (The most common bottleneck in web apps).

5. Latency vs. Throughput

These two terms are constantly confused, but understanding the difference is critical for FAANG interviews.

Latency: The time it takes for a single request to travel from the client, be processed by the server, and return a response. (Measured in milliseconds - ms). *Analogy:* How long it takes a single car to travel from New York to Boston.

Throughput: The amount of data or number of requests the system can process in a given amount of time. (Measured in Requests Per Second - RPS). *Analogy:* How many cars arrive in Boston per hour.

*Tradeoff:* You can have high throughput with high latency (A massive cargo ship carrying 10,000 containers slowly across the ocean). You can have low latency with low throughput (A fighter jet carrying one package instantly). In system design, we generally strive to Maximize Throughput while Minimizing Latency.

6. Performance vs. Scalability

  • Performance Problem: Your system is slow for a single user. (Fix: Optimize database queries, reduce payload size).
  • Scalability Problem: Your system is fast for 1 user, but becomes incredibly slow (or crashes) for 10,000 concurrent users. (Fix: Add more servers, implement caching).

7. Real-World Scenario: The Viral Crash

*Scenario:* A small e-commerce website is hosted on a single server (Vertical Scale). A celebrity tweets a link to the site. Traffic spikes from 10 RPS to 10,000 RPS. *The Bottleneck:* The server's CPU hits 100% trying to process HTTP requests. *The Crash:* The server runs out of memory, drops connections, and the site goes offline. *The Fix:* The engineers migrate to a Horizontal Scaling architecture. They put 10 smaller web servers behind an Elastic Load Balancer. Now, each server only handles 1,000 RPS, keeping CPU usage at a safe 40%. The site stays online.

8. Visual Explanation: Scaling Up vs Scaling Out

text
1234567
VERTICAL SCALING (Scale Up)
[ Server 1 (2GB RAM) ]  ---> Upgrade ---> [ SERVER 1 (64GB RAM, 32 Cores) ]
(SPOF exists. Hardware limit exists.)

HORIZONTAL SCALING (Scale Out)
[ Server 1 ] ---> Add Nodes ---> [ Server 1 ] [ Server 2 ] [ Server 3 ] [ Server 4 ]
(Requires a Load Balancer to distribute traffic to the 4 nodes.)

9. Mini Project: Identify the Bottleneck

Analyze this architecture:
  • 10 Web Servers (Stateless, running Node.js)
  • 1 Load Balancer
  • 1 Relational Database (MySQL) running on a standard EC2 instance.
*Question:* If traffic increases 100x, what fails first? *Answer:* The MySQL Database. The 10 web servers can easily scale horizontally to handle HTTP requests, but they all funnel their database queries into a single MySQL instance. The database will become CPU/Disk bound and crash. (We will learn how to fix this in Chapter 12).

10. Common Mistakes

  • Assuming Horizontal Scaling is Free: While commodity hardware is cheap, the engineering cost of refactoring a monolithic application into stateless microservices to support horizontal scaling is massive.
  • Confusing Latency and Bandwidth: Latency is speed (ping). Bandwidth is capacity (the width of the pipe).

11. Best Practices

  • Stateless Web Tier: For horizontal scaling to work, web servers must be stateless. They should not store user session data locally (in RAM). If Server 1 dies, Server 2 must be able to seamlessly handle the user's next request. Store session data in a centralized cache like Redis.

12. Exercises

  1. 1. If your system takes 5 seconds to process a payment, is that a Latency problem or a Throughput problem?
  1. 2. List two reasons why relying solely on Vertical Scaling is dangerous for an enterprise application.

13. MCQs

Question 1

What is Vertical Scaling (Scaling Up)?

Question 2

What is the primary disadvantage of Vertical Scaling?

Question 3

What is Horizontal Scaling (Scaling Out)?

Question 4

What critical component is required to make Horizontal Scaling work?

Question 5

How is Latency defined in system design?

Question 6

How is Throughput defined in system design?

Question 7

If a system handles 10,000 concurrent users but takes 10 seconds to load a page for each user, what is its status?

Question 8

What must be true about application servers for horizontal scaling to work effectively?

Question 9

What is a "Bottleneck" in system design?

Question 10

In a typical modern web architecture that scales from 1,000 to 1,000,000 users, which component is usually the hardest to scale horizontally and becomes the primary bottleneck?

14. Interview Questions

  • Q: "Your startup is experiencing rapid growth. Your manager suggests buying a $50,000 server with 1TB of RAM instead of rewriting the app for horizontal scaling. What are the tradeoffs?"

15. FAQs

  • Q: Do companies still use Vertical Scaling?
A: Yes! Stack Overflow famously runs its massive platform on just a handful of extremely powerful on-premise, vertically scaled servers. It drastically simplifies their architecture, though it requires elite engineering to maintain.

16. Summary

Scalability is the transition from a single laptop to a global data center. Vertical scaling (adding hardware) is simple but limited and risky. Horizontal scaling (adding machines) provides infinite scale and high availability, but introduces distributed complexity. To ace interviews, you must identify system bottlenecks and continuously optimize for high throughput (volume) and low latency (speed).

17. Next Chapter Recommendation

We know we need multiple servers, but how do clients actually talk to them? In Chapter 3: Client-Server Architecture, we will dissect the lifecycle of an HTTP request, explain DNS, and separate the Web Tier from the Application Tier.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·