CHAPTER 05
Beginner
SQL vs NoSQL Databases
Updated: May 18, 2026
5 min read
# CHAPTER 5
SQL vs NoSQL Databases
1. Chapter Introduction
One of the most critical decisions you will make in a System Design interview is choosing the data storage layer. Interviewers will explicitly ask, "Why did you choose PostgreSQL over MongoDB?" If you answer, "Because I like it," you will fail. This chapter dissects the architectural tradeoffs between Relational (SQL) and Non-Relational (NoSQL) databases, introduces the foundational CAP Theorem, and provides a framework for choosing the right tool for the job.2. The Core Differences
SQL (Relational):
- *Structure:* Data is stored in rigid tables (Rows/Columns).
- *Schema:* Strict Schema. You must define the columns before inserting data.
- *Scaling:* Scales Vertically (adding more RAM/CPU to one massive server). Very difficult to scale horizontally.
- *Examples:* MySQL, PostgreSQL, Oracle.
NoSQL (Non-Relational):
- *Structure:* Data is stored as JSON documents, Key-Value pairs, or Wide-Columns.
- *Schema:* Dynamic/Flexible Schema. You can insert unstructured data on the fly.
- *Scaling:* Scales Horizontally natively. Designed to be distributed across hundreds of cheap servers.
- *Examples:* MongoDB (Document), DynamoDB (Key-Value), Cassandra (Wide-Column).
3. The Problem with Scaling SQL
In Chapter 2, we learned that horizontal scaling is the ultimate goal. Why is horizontally scaling an SQL database hard? Because SQL relies on JOINs and strict ACID consistency. If Table A lives on Server 1, and Table B lives on Server 2, performing a JOIN across the network is incredibly slow. Furthermore, keeping the data perfectly consistent across multiple servers during a transaction requires complex locking mechanisms that destroy performance. *NoSQL abandons JOINs and strict consistency to achieve massive, distributed, horizontal scale.*4. The CAP Theorem
The CAP Theorem is a fundamental principle of distributed systems. It states that a distributed data store can only provide two of the following three guarantees simultaneously:- 1. Consistency (C): Every read receives the most recent write. (All nodes see the exact same data at the same time).
- 2. Availability (A): Every request receives a non-error response, even if some nodes fail. (The system is always up).
- 3. Partition Tolerance (P): The system continues to operate despite network failures (partitions) between nodes.
*The Reality:* Because network failures are inevitable on the internet, you MUST choose Partition Tolerance (P). Therefore, in a distributed system, you are forced to choose between Consistency (CP) or Availability (AP).
5. CP vs. AP Systems
- CP Systems (Consistency + Partition Tolerance): If a network node fails, the system returns an error rather than serving outdated data.
- AP Systems (Availability + Partition Tolerance): The system always responds, but it might return stale, outdated data.
6. Eventual Consistency
AP systems rely on a concept called Eventual Consistency. When a user updates their profile picture, it is saved to Node 1. The system immediately says "Success." In the background, Node 1 replicates the image to Node 2 and Node 3. For a few milliseconds, the database is "inconsistent." However, *eventually*, all nodes will hold the updated picture.7. Types of NoSQL Databases
You must know the four main categories:- 1. Document Stores (MongoDB): Stores data as JSON-like documents. Great for unstructured data and rapid prototyping (e.g., Content Management Systems).
- 2. Key-Value Stores (Redis, DynamoDB): Extremely fast. Stores data as a dictionary (Key -> Value). Great for caching, user sessions, and shopping carts.
- 3. Wide-Column Stores (Cassandra): Designed for massive amounts of write-heavy data. Great for time-series data, IoT logs, and messaging apps (Discord uses Cassandra).
- 4. Graph Databases (Neo4j): Designed to map relationships. Great for recommendation engines and social networks ("Friends of Friends").
8. Real-World Scenario: Designing an E-Commerce Platform
*Interview Question:* Design Amazon.com. Choose your databases. *The SQL vs NoSQL Hybrid Approach:*- Billing and Orders: Must use SQL (PostgreSQL). Strict ACID compliance is mandatory. We cannot lose money or double-charge a user.
- Product Catalog: Use NoSQL Document Store (MongoDB). Products have completely different attributes (a TV has 'screen size', a shirt has 'fabric type'). A flexible schema is perfect here.
- Shopping Cart: Use NoSQL Key-Value Store (Redis/DynamoDB). Needs to be lightning-fast, highly available, and ephemeral.
- Recommendations: Use Graph Database (Neo4j) to calculate "Users who bought X also bought Y."
9. Mini Project: The Database Decision Matrix
Create a cheat sheet for your interviews:- If Data is highly structured & requires ACID -> PostgreSQL/MySQL
- If Data is unstructured & rapid iteration is needed -> MongoDB
- If massive Write volume & high availability is needed -> Cassandra
- If sub-millisecond latency is needed -> Redis
10. Common Mistakes
- "I'll use MongoDB because it's faster." MongoDB is not inherently faster than PostgreSQL. It depends entirely on the query and the indexing. Don't use generic buzzwords.
- Trying to use JOINs in NoSQL: NoSQL databases are not designed for complex JOINs. If your data requires heavy relational queries, you chose the wrong database. Denormalize your data instead.
11. Best Practices
- Default to SQL: Unless the interview prompt explicitly describes massive unstructured data or scale that requires hundreds of nodes, start with a Relational Database. It is the safest, most robust choice.
12. Exercises
- 1. According to the CAP theorem, why is it impossible to have a system that is 100% Consistent, 100% Available, and Partition Tolerant?
- 2. If you are building a real-time leaderboard for a global video game, which NoSQL database type would you choose and why?
13. MCQs
Question 1
What is the primary structural difference between SQL and NoSQL databases?
Question 2
Why is Horizontal Scaling natively easier in NoSQL databases compared to SQL?
Question 3
What does the CAP Theorem stand for?
Question 4
According to the CAP Theorem, because network failures (Partitions) are inevitable on the internet, what choice must an architect make?
Question 5
What is "Eventual Consistency" in an AP (Available/Partition Tolerant) NoSQL system?
Question 6
If you are designing the "Likes" counter for a social media post, which CAP theorem configuration is best?
Question 7
Which NoSQL database type stores data as a dictionary and is optimized for sub-millisecond, highly available lookups (e.g., shopping carts, session data)?
Question 8
Why is a Document Store (like MongoDB) ideal for an E-commerce Product Catalog?
Question 9
Which database type is explicitly designed to handle heavy write volumes (e.g., logging millions of IoT sensor data points per minute)?
Question 10
In a FAANG system design interview, is it common to use only one type of database for the entire architecture?
14. Interview Questions
- Q: "We are building a real-time collaborative document editor (like Google Docs). Would you prioritize Consistency or Availability according to the CAP theorem? Justify your answer."
15. FAQs
- Q: Are modern SQL databases starting to support JSON?