Skip to main content
System Design Interview
CHAPTER 12 Beginner

Database Scaling and Sharding

Updated: May 18, 2026
5 min read

# CHAPTER 12

Database Scaling and Sharding

1. Chapter Introduction

In almost every System Design interview, the Web Tier is trivial to scale—you just add more stateless Node.js servers. The true architectural challenge lies in the Database Tier. How do you scale a Relational Database that holds state and must guarantee ACID compliance? This chapter covers the progression of database scaling: starting with Vertical Scaling, moving to Master-Slave Replication, and finally crossing the threshold into Horizontal Partitioning and Sharding.

2. Step 1: Vertical Scaling (The Easy Button)

Before you introduce the nightmare of distributed databases, try Vertical Scaling. Upgrade the Amazon RDS instance to have 64 cores and 512GB of RAM. *Pros:* Zero code changes. Maintains perfect ACID consistency. *Cons:* Hardware has a physical limit. Extremely expensive. Creates a Single Point of Failure.

3. Step 2: Master-Slave Replication (Read Replicas)

Most web applications are heavily Read-Biased (e.g., 90% Reads, 10% Writes on Twitter). To relieve the database bottleneck, we separate the read and write traffic.

The Architecture:

  • Master Node: The primary database. It handles ALL INSERT, UPDATE, and DELETE queries.
  • Slave Nodes (Read Replicas): 1 to 5 duplicate databases. They handle ALL SELECT queries.
  • *The Sync:* The Master asynchronously replicates its data to the Slaves via an operations log (binlog).

*Interview Benefit:* This solves High Availability. If the Master dies, the system automatically promotes a Slave to become the new Master.

*The Tradeoff (Replication Lag):* Because the replication is asynchronous, if a user updates their profile picture (Write to Master) and immediately refreshes the page (Read from Slave), they might see their old picture for ~500 milliseconds until the Slave catches up.

4. Step 3: Database Federation (Functional Partitioning)

If your Master database is still overwhelmed by Write traffic, you can split the database by function. Instead of one massive E-Commerce database, you create:
  • Database 1: Handles only Users and Auth tables.
  • Database 2: Handles only Products and Inventory tables.
  • Database 3: Handles only Billing and Orders tables.
*Note: This naturally aligns with transitioning to a Microservices architecture.*

5. Step 4: Sharding (Horizontal Partitioning)

When a single table (e.g., Tweets) becomes so massive (billions of rows) that it cannot physically fit on one server, or the Write traffic is too high for a single Master, you must Shard. Sharding splits a single logical table across multiple separate database servers.

*Example:* You have 100 million Users.

  • Shard 1 (Server 1): Holds Users 1 to 33,000,000
  • Shard 2 (Server 2): Holds Users 33,000,001 to 66,000,000
  • Shard 3 (Server 3): Holds Users 66,000,001 to 100,000,000

6. Sharding Strategies (The Routing Algorithm)

How does the application know which Shard holds User 45,000,000? You must define a Sharding Key (Partition Key).
  1. 1. Range-Based Sharding: Shard by alphabetical order or ID ranges (e.g., Users A-M on Shard 1, N-Z on Shard 2). *Flaw:* Can lead to hotspots. If 80% of your users have names starting with 'A', Shard 1 crashes while Shard 2 sits empty.
  1. 2. Hash-Based Sharding: Pass the User ID through a hash function (e.g., Hash(UserID) % 3). This distributes data perfectly evenly across the 3 shards. *Flaw:* If you need to add a 4th shard later, the modulo math changes (% 4), requiring you to move millions of records to new servers.
  1. 3. Consistent Hashing: The industry-standard solution to the Hash flaw. It places servers and data on a virtual "Hash Ring," allowing you to add or remove database shards while only moving a minimal fraction of the data.

7. The Nightmares of Sharding

Do not suggest Sharding in an interview unless it is absolutely necessary. It introduces massive complexity:
  • Complex JOINs: You can no longer perform a SQL JOIN between a User (on Shard 1) and their Order (on Shard 3) at the database level. You must pull the data from both shards and join it in the Application code (slow).
  • Celebrity Hotspots: In a social network, if Justin Bieber is on Shard 2, his massive traffic will overwhelm Shard 2, requiring custom routing logic just for his account.

8. Real-World Scenario: The Sharding Decision

*Candidate:* "I will design Instagram. I will put all the photos in a MySQL database and Shard it by User
ID." *Interviewer:* "How does a user view the global 'Discover' feed containing photos from users on 50 different shards?" *Candidate:* "Oh... I would have to query all 50 shards simultaneously and sort them in memory." *Result:* Fail. *The Fix:* Use a NoSQL database designed for this scale (like Cassandra), or denormalize the SQL data so that feeds are pre-computed and stored in Redis.

9. Mini Project: Design the Replication Flow

Draw the architecture for a Read-Heavy blog (like Medium).
  1. 1. Writers publish articles -> Request hits Load Balancer -> Hits App Server -> INSERT into Master DB.
  1. 2. Master DB asynchronously replicates the article to 3 Read Replicas.
  1. 3. Readers view articles -> Request hits Load Balancer -> Hits App Server -> SELECT from Read Replica 1, 2, or 3 (Load balanced).

10. Common Mistakes

  • Sharding immediately: Do not jump straight to Sharding. The interview progression should always be: Vertical Scale -> Add Caching (Redis) -> Master-Slave Read Replicas -> Sharding (only if Write traffic demands it).

11. Best Practices

  • Denormalization: In a sharded environment, JOINs are expensive. It is often better to duplicate data across tables (Denormalization) to avoid network-level JOINs entirely.

12. Exercises

  1. 1. Explain "Replication Lag" in a Master-Slave setup and give an example of how it might confuse a user.
  1. 2. Why is Hash-Based sharding better than Range-Based sharding for preventing "Hotspots"?

13. MCQs

Question 1

In a Master-Slave database architecture, how is traffic separated?

Question 2

What is "Replication Lag"?

Question 3

How does Master-Slave replication improve High Availability?

Question 4

What is Database Federation (Functional Partitioning)?

Question 5

What is Sharding (Horizontal Partitioning)?

Question 6

What is a "Sharding Key" (or Partition Key)?

Question 7

What is a "Hotspot" in database sharding?

Question 8

Why does basic "Hash-Based Sharding" (e.g., Hash(ID) % 3) create a nightmare when you need to add a 4th server?

Question 9

What is a major disadvantage of Sharding a relational database?

Question 10

Should you suggest Sharding immediately in a system design interview?

14. Interview Questions

  • Q: "We have a Multi-Player Game Leaderboard. The Master DB handles writes (scores) and Slaves handle reads (viewing the board). A user scores a point, but the UI doesn't update immediately. Why? How would you fix it without abandoning the Slaves?"

15. FAQs

  • Q: Does NoSQL need to be manually sharded?
A: Usually, no. Databases like Cassandra and MongoDB handle sharding (partitioning) natively under the hood. Relational databases like MySQL usually require complex manual application-level routing to shard effectively.

16. Summary

Scaling the Database Tier is a sequential process. Begin with Vertical Scaling. As Read traffic grows, implement Master-Slave replication. If Write traffic becomes the bottleneck, use Database Federation. When a single table outgrows a physical server, implement Sharding. Understand that Sharding destroys the ability to easily perform SQL JOINs and requires careful selection of a Sharding Key (often using Consistent Hashing) to prevent server Hotspots.

17. Next Chapter Recommendation

Our data is scaled and available. But is it secure? In Chapter 13: Authentication and Security in System Design, we will cover JWTs, OAuth, Rate Limiting, and how to protect your architecture from malicious attacks.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·