Skip to main content
System Design Interview
CHAPTER 06 Beginner

Load Balancing and Reverse Proxies

Updated: May 18, 2026
5 min read

# CHAPTER 6

Load Balancing and Reverse Proxies

1. Chapter Introduction

If you are scaling horizontally, you have multiple servers. But how does a user's browser know which server to talk to? Enter the Load Balancer—the traffic cop of the internet. Load balancers distribute incoming network traffic across a group of backend servers to ensure no single server bears too much demand. This chapter explains how load balancers achieve high availability, the difference between a Load Balancer and a Reverse Proxy, and the algorithms they use to route traffic.

2. What is a Load Balancer?

A Load Balancer is a hardware device or software program that sits between the client and the server cluster. Its primary goals are:
  • Distribute Traffic: Prevent any single server from becoming a bottleneck.
  • High Availability: Route traffic only to healthy servers (using Health Checks).
  • Flexibility: Allow engineers to add or remove servers from the cluster without disrupting the user experience.

*Hardware vs. Software:* In the 2000s, companies bought $100k hardware load balancers. Today, almost everyone uses software load balancers (e.g., Nginx, HAProxy) or managed cloud services (e.g., AWS Elastic Load Balancer).

3. Load Balancing Algorithms

How does the load balancer decide which server gets the next request? Interviewers will ask you to justify your routing algorithm.
  1. 1. Round Robin: Routes requests sequentially (Server 1, then Server 2, then Server 3, repeat). *Pros:* Simple. *Cons:* If Server 1 is a slow machine and Server 2 is fast, they still get the same amount of traffic.
  1. 2. Weighted Round Robin: Assigns more traffic to more powerful servers based on a defined weight.
  1. 3. Least Connections: Routes the request to the server with the fewest active connections. *Best for:* Long-lived connections like WebSockets or heavy computation.
  1. 4. IP Hashing: Uses a mathematical hash of the client's IP address to ensure a specific user is *always* routed to the exact same server. *Best for:* Stateful applications where a user's session data is stuck on one server.

4. Health Checks (The Lifeline)

A load balancer is useless if it sends traffic to a dead server. Load balancers constantly send Health Checks (e.g., an HTTP GET request to /health) to every server in the pool every 5 seconds.
  • If Server 2 returns a 200 OK, it remains in the pool.
  • If Server 2 times out or returns a 500 Error, the load balancer instantly removes it from the routing table and alerts the DevOps team. Users never notice the crash.

5. Layer 4 vs. Layer 7 Load Balancing

In senior interviews, you must understand the OSI Model layers.
  • Layer 4 (Transport Layer): Routes traffic based purely on IP addresses and TCP ports. It does not look at the content of the message. It is incredibly fast but blind.
  • Layer 7 (Application Layer): Inspects the actual HTTP HTTP payload. It can route traffic based on the URL. (e.g., Route domain.com/images to Server A, and domain.com/api to Server B). It is slightly slower but highly intelligent.

6. Reverse Proxy vs. Forward Proxy

A Forward Proxy sits in front of *clients* and protects them from the internet (e.g., a corporate VPN that blocks employees from visiting certain websites). A Reverse Proxy sits in front of *servers* and protects them from the internet. *Are Load Balancers and Reverse Proxies the same?* Often, yes. Nginx acts as both. However, a reverse proxy's primary job is security (hiding internal server IP addresses), SSL termination (decrypting HTTPS so the app servers don't have to), and caching, while a load balancer's primary job is traffic distribution.

7. Real-World Scenario: The Sticky Session Trap

*Candidate Design:* A candidate designs an architecture where users upload large video files. They use an "IP Hashing" algorithm (Sticky Sessions) so the user's upload continues on the same server. *The Problem:* A massive office building with 5,000 employees all share the same public IP address. The load balancer hashes that IP and sends all 5,000 users to Server 1, instantly crashing it, while Servers 2-10 sit idle. *The Fix:* Do not use Sticky Sessions. Store session data in a centralized Redis cache and use "Least Connections" routing so any server can handle the video upload fragments evenly.

8. The Redundancy Problem (Who balances the Load Balancer?)

If all traffic goes through the Load Balancer, isn't the Load Balancer a Single Point of Failure (SPOF)? Yes. To fix this, we implement Active-Passive Redundancy. You deploy *two* Load Balancers. The Active one handles all traffic. The Passive one monitors the Active one via a "Heartbeat" connection. If the Active LB crashes, the Passive LB instantly takes over the IP address and begins routing traffic.

9. Mini Project: Whiteboard the Flow

Draw the complete flow of an HTTP request prioritizing High Availability:
  1. 1. User -> DNS -> IP Address.
  1. 2. IP Address -> Active Load Balancer (Layer 7).
  1. 3. LB reads URL path /checkout.
  1. 4. LB uses Least Connections algorithm to select healthy Server 3.
  1. 5. Server 3 processes and responds through the LB back to the user.

10. Common Mistakes

  • Assuming Load Balancers fix bad code: Throwing a load balancer in front of an application with memory leaks will just result in 10 servers crashing instead of 1.
  • SSL Overhead: Forgetting to mention "SSL Termination" at the load balancer. Decrypting HTTPS requires heavy CPU. Let the LB do it so internal servers can communicate via fast, unencrypted HTTP.

11. Best Practices

  • Auto-Scaling Groups: Load Balancers should be tied to cloud Auto-Scaling Groups. If the LB detects average CPU usage across the cluster is above 80%, it automatically spins up 5 new servers and adds them to the pool.

12. Exercises

  1. 1. If you are building an API that handles quick, identical requests (like checking the weather), which routing algorithm should you use?
  1. 2. Explain how a Health Check prevents downtime during a server crash.

13. MCQs

Question 1

What is the primary purpose of a Load Balancer in system design?

Question 2

Which routing algorithm simply distributes requests sequentially (Server 1, then Server 2, then Server 3) regardless of the server's current load?

Question 3

If you are running an application with long-lived WebSocket connections, which routing algorithm is best to prevent overloading a single server?

Question 4

What is a "Health Check" in the context of Load Balancing?

Question 5

What is the difference between Layer 4 and Layer 7 Load Balancing?

Question 6

What is the primary difference between a Forward Proxy and a Reverse Proxy?

Question 7

What is the danger of using "IP Hashing" (Sticky Sessions) to route traffic?

Question 8

How do you solve the problem of the Load Balancer itself becoming a Single Point of Failure (SPOF)?

Question 9

What is "SSL Termination" and why is it a best practice to do it at the Load Balancer?

Question 10

What does an Auto-Scaling Group do in conjunction with a Load Balancer?

14. Interview Questions

  • Q: "We have a monolithic application. We want to route API traffic to a new Node.js microservice, but keep the web traffic hitting the old monolith. How do we configure the Load Balancer to do this?" (Hint: Layer 7).

15. FAQs

  • Q: Should I put a load balancer in front of my databases?
A: Yes, but it is a specialized internal load balancer. It routes "Read" queries to Read-Replicas and "Write" queries to the Master database.

16. Summary

Load Balancers are the cornerstone of horizontal scaling, distributing traffic via algorithms like Round Robin or Least Connections. They ensure High Availability through continuous Health Checks. Modern architectures utilize Layer 7 routing and SSL Termination at the edge, while ensuring the Load Balancer itself is not a single point of failure by deploying Active-Passive redundancy.

17. Next Chapter Recommendation

Our servers are balanced, but querying the database for every single request is still too slow. In Chapter 7: Caching Strategies and CDN, we will learn how to introduce memory-based caching to achieve sub-millisecond response times and survive viral traffic spikes.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·