CHAPTER 16
Beginner
Common System Design Problems
Updated: May 18, 2026
5 min read
# CHAPTER 16
Common System Design Problems
1. Chapter Introduction
You have learned the individual puzzle pieces: Load Balancers, Redis, NoSQL, Kafka, and WebSockets. In an interview, your job is to assemble these pieces into a working architecture under pressure. This chapter provides high-level architectural blueprints for the three most common System Design interview questions: The URL Shortener, The Chat Application, and The Social Media Feed.2. Design Problem 1: The URL Shortener (e.g., bit.ly)
Requirements: Take a long URL, return a short URL. When users click the short URL, redirect them. Must be highly available. Estimation: 100M URLs generated per month. Read-heavy (100 Reads for every 1 Write).The Architecture:
- 1. API Gateway: Handles Rate Limiting (preventing a single user from generating 1M links a second).
- 2. Web Tier: Stateless Node.js servers.
- 3. The Hash Generator (The Core Logic):
-
How do we generate a unique 7-character string? We can use Base62 encoding (
A-Z,a-z,0-9).
- We use a central Database Sequence (auto-increment ID) or a dedicated service (e.g., Twitter Snowflake) to generate a unique numeric ID.
-
We convert the ID (e.g.,
10000) into a Base62 string (e.g.,Ab3).
- 4. Data Storage: A Relational DB (PostgreSQL) is fine, but a Key-Value Store (DynamoDB) is better because we only ever query by the short hash.
- 5. Caching (Crucial): 90% of clicks hit the same 10% of viral links. Place a Redis Cache in front of the DB.
-
6.
Redirection: When the GET request hits the Web Server, it queries Redis, gets the Long URL, and returns an
HTTP 301 Permanent Redirectto the client.
3. Design Problem 2: The Chat Application (e.g., WhatsApp)
Requirements: 1-on-1 real-time chat, group chat, message history, online presence. Estimation: 500M Daily Active Users. High throughput, ultra-low latency required.The Architecture:
- 1. Connection Tier: Clients connect to WebSocket Servers via a Load Balancer configured for long-lived connections.
-
2.
State Management: When User A connects to WebSocket Server 1, we save their location in a Redis
SessionCache(UserA -> Server_1).
- 3. Message Routing (Pub/Sub):
- User A sends a message to User B.
- Server 1 queries Redis: "Where is User B?"
- Redis says "User B is on Server 2."
- Server 1 publishes the message to a Redis Pub/Sub channel that Server 2 is listening to.
- Server 2 pushes the message down User B's open WebSocket.
- 4. Data Storage: Chat history is massive and unstructured. Use a Wide-Column Store (Cassandra) or a Key-Value store (HBase) optimized for rapid time-series ingestion.
- 5. Presence Service: A separate microservice that receives "Heartbeats" (pings) from clients every 5 seconds to update their Online/Offline status in Redis.
4. Design Problem 3: The Social Media Feed (e.g., Twitter/Instagram)
Requirements: Users can post text/images. Users can view a chronological timeline of posts from people they follow. Estimation: 300M Users. Extremely Read-heavy.The Architecture (The Fan-out Problem): The hardest part of designing Twitter is generating the timeline. *Approach 1: Fan-out on Read (Pull Model)*
- When I open my app, the server queries the DB: "Find all users I follow. Fetch their last 20 posts. Sort by time."
- *Flaw:* Devastatingly slow. Too many JOINs.
- We pre-compute the timeline.
- When User A posts a tweet, a background Kafka worker looks up all of User A's followers.
- The worker *pushes* the new tweet into the individual Redis Timeline Caches of every single follower.
- When a follower opens their app, the server simply fetches their pre-computed timeline from Redis in 1 millisecond. O(1) time complexity.
5. HR Perspective: The "Perfect" Answer Myth
Interviewers do not expect you to build a flawless system. They want to see you struggle with the tradeoffs. When designing the Social Media Feed, if you explicitly identify the "Celebrity Edge Case" on your own without being prompted, you immediately jump from Mid-Level to Senior in the interviewer's mind.6. Mini Project: Design the Instagram Feed
Combine your knowledge:-
1.
Define the APIs:
POST /api/v1/posts(Upload image),GET /api/v1/feed(View timeline).
- 2. Database: PostgreSQL for User Data, MongoDB for Post Metadata, AWS S3 for actual Image files.
- 3. Caching: Redis for pre-computed user feeds.
- 4. CDN: CloudFront to cache the heavy S3 images globally.
7. Common Mistakes
- Ignoring Scale: Designing a system that works for 100 users but fails for 1 million. Always start your interview with the "Back-of-the-Envelope Estimation" step to define the scale.
- Forgetting the CDN: If the prompt involves photos or videos (Instagram, Netflix, YouTube), you MUST draw a CDN on the whiteboard.
8. Best Practices
- Asynchronous Workers: Any heavy lifting (like generating video thumbnails or pre-computing feeds) should be moved off the main HTTP thread and processed asynchronously using a Message Queue (SQS/Kafka).
9. Exercises
- 1. In the URL Shortener design, why is a Key-Value store (DynamoDB) better than a Relational database (MySQL)?
- 2. In the Chat App design, explain how Redis Pub/Sub solves the problem of users being connected to different WebSocket servers.
10. MCQs
Question 1
In a URL Shortener design, what is the most efficient way to generate the 7-character short URL?
Question 2
Why is a Key-Value database (like DynamoDB) highly recommended for a URL Shortener?
Question 3
In the Chat Application design, which protocol MUST be used to connect the Client to the Chat Servers?
Question 4
In the Chat Application, how does Server 1 send a message to Server 2 if the two users are on different servers?
Question 5
What type of database is best suited for storing billions of chat history messages?
Question 6
What is the "Fan-out on Read" approach for generating a Social Media Timeline?
Question 7
What is the superior "Fan-out on Write" approach?
Question 8
What is the "Celebrity Edge Case" in the Fan-out on Write model?
Question 9
If a System Design problem requires storing and serving heavy image or video files, what component is absolutely mandatory?
Question 10
How does a URL Shortener handle 100,000 users clicking the exact same viral link in 1 second?
11. Interview Questions
- Q: "Design a global Ride-Sharing app (like Uber). How do you handle the real-time tracking of 100,000 drivers broadcasting their GPS coordinates every 3 seconds?" (Hint: Think about WebSockets, Geo-spatial databases, and heavy write throughput).
12. FAQs
- Q: Should I memorize these architectures?