Database Scaling and Sharding
# CHAPTER 12
Database Scaling and Sharding
1. Chapter Introduction
In almost every System Design interview, the Web Tier is trivial to scale—you just add more stateless Node.js servers. The true architectural challenge lies in the Database Tier. How do you scale a Relational Database that holds state and must guarantee ACID compliance? This chapter covers the progression of database scaling: starting with Vertical Scaling, moving to Master-Slave Replication, and finally crossing the threshold into Horizontal Partitioning and Sharding.2. Step 1: Vertical Scaling (The Easy Button)
Before you introduce the nightmare of distributed databases, try Vertical Scaling. Upgrade the Amazon RDS instance to have 64 cores and 512GB of RAM. *Pros:* Zero code changes. Maintains perfect ACID consistency. *Cons:* Hardware has a physical limit. Extremely expensive. Creates a Single Point of Failure.3. Step 2: Master-Slave Replication (Read Replicas)
Most web applications are heavily Read-Biased (e.g., 90% Reads, 10% Writes on Twitter). To relieve the database bottleneck, we separate the read and write traffic.The Architecture:
-
Master Node: The primary database. It handles ALL
INSERT,UPDATE, andDELETEqueries.
-
Slave Nodes (Read Replicas): 1 to 5 duplicate databases. They handle ALL
SELECTqueries.
- *The Sync:* The Master asynchronously replicates its data to the Slaves via an operations log (binlog).
*Interview Benefit:* This solves High Availability. If the Master dies, the system automatically promotes a Slave to become the new Master.
*The Tradeoff (Replication Lag):* Because the replication is asynchronous, if a user updates their profile picture (Write to Master) and immediately refreshes the page (Read from Slave), they might see their old picture for ~500 milliseconds until the Slave catches up.
4. Step 3: Database Federation (Functional Partitioning)
If your Master database is still overwhelmed by Write traffic, you can split the database by function. Instead of one massive E-Commerce database, you create:-
Database 1: Handles only
UsersandAuthtables.
-
Database 2: Handles only
ProductsandInventorytables.
-
Database 3: Handles only
BillingandOrderstables.
5. Step 4: Sharding (Horizontal Partitioning)
When a single table (e.g.,Tweets) becomes so massive (billions of rows) that it cannot physically fit on one server, or the Write traffic is too high for a single Master, you must Shard.
Sharding splits a single logical table across multiple separate database servers.
*Example:* You have 100 million Users.
- Shard 1 (Server 1): Holds Users 1 to 33,000,000
- Shard 2 (Server 2): Holds Users 33,000,001 to 66,000,000
- Shard 3 (Server 3): Holds Users 66,000,001 to 100,000,000
6. Sharding Strategies (The Routing Algorithm)
How does the application know which Shard holds User 45,000,000? You must define a Sharding Key (Partition Key).- 1. Range-Based Sharding: Shard by alphabetical order or ID ranges (e.g., Users A-M on Shard 1, N-Z on Shard 2). *Flaw:* Can lead to hotspots. If 80% of your users have names starting with 'A', Shard 1 crashes while Shard 2 sits empty.
-
2.
Hash-Based Sharding: Pass the User ID through a hash function (e.g.,
Hash(UserID) % 3). This distributes data perfectly evenly across the 3 shards. *Flaw:* If you need to add a 4th shard later, the modulo math changes (% 4), requiring you to move millions of records to new servers.
- 3. Consistent Hashing: The industry-standard solution to the Hash flaw. It places servers and data on a virtual "Hash Ring," allowing you to add or remove database shards while only moving a minimal fraction of the data.
7. The Nightmares of Sharding
Do not suggest Sharding in an interview unless it is absolutely necessary. It introduces massive complexity:-
Complex JOINs: You can no longer perform a SQL
JOINbetween a User (on Shard 1) and their Order (on Shard 3) at the database level. You must pull the data from both shards and join it in the Application code (slow).
- Celebrity Hotspots: In a social network, if Justin Bieber is on Shard 2, his massive traffic will overwhelm Shard 2, requiring custom routing logic just for his account.
8. Real-World Scenario: The Sharding Decision
*Candidate:* "I will design Instagram. I will put all the photos in a MySQL database and Shard it byUserID."
*Interviewer:* "How does a user view the global 'Discover' feed containing photos from users on 50 different shards?"
*Candidate:* "Oh... I would have to query all 50 shards simultaneously and sort them in memory."
*Result:* Fail.
*The Fix:* Use a NoSQL database designed for this scale (like Cassandra), or denormalize the SQL data so that feeds are pre-computed and stored in Redis.
9. Mini Project: Design the Replication Flow
Draw the architecture for a Read-Heavy blog (like Medium).-
1.
Writers publish articles -> Request hits Load Balancer -> Hits App Server ->
INSERTinto Master DB.
- 2. Master DB asynchronously replicates the article to 3 Read Replicas.
-
3.
Readers view articles -> Request hits Load Balancer -> Hits App Server ->
SELECTfrom Read Replica 1, 2, or 3 (Load balanced).
10. Common Mistakes
- Sharding immediately: Do not jump straight to Sharding. The interview progression should always be: Vertical Scale -> Add Caching (Redis) -> Master-Slave Read Replicas -> Sharding (only if Write traffic demands it).
11. Best Practices
- Denormalization: In a sharded environment, JOINs are expensive. It is often better to duplicate data across tables (Denormalization) to avoid network-level JOINs entirely.
12. Exercises
- 1. Explain "Replication Lag" in a Master-Slave setup and give an example of how it might confuse a user.
- 2. Why is Hash-Based sharding better than Range-Based sharding for preventing "Hotspots"?
13. MCQs
In a Master-Slave database architecture, how is traffic separated?
What is "Replication Lag"?
How does Master-Slave replication improve High Availability?
What is Database Federation (Functional Partitioning)?
What is Sharding (Horizontal Partitioning)?
What is a "Sharding Key" (or Partition Key)?
What is a "Hotspot" in database sharding?
Why does basic "Hash-Based Sharding" (e.g., Hash(ID) % 3) create a nightmare when you need to add a 4th server?
What is a major disadvantage of Sharding a relational database?
Should you suggest Sharding immediately in a system design interview?
14. Interview Questions
- Q: "We have a Multi-Player Game Leaderboard. The Master DB handles writes (scores) and Slaves handle reads (viewing the board). A user scores a point, but the UI doesn't update immediately. Why? How would you fix it without abandoning the Slaves?"
15. FAQs
- Q: Does NoSQL need to be manually sharded?