Skip to main content
System Design
CHAPTER 04 Beginner

Databases and Data Storage

Updated: May 18, 2026
5 min read

# CHAPTER 4

Databases and Data Storage

1. Chapter Introduction

If servers are the brain of your application, databases are the memory. In a system design interview, the database architecture is often the most scrutinized component. Choosing the wrong data storage solution leads to data corruption, massive latency, and system failure under load. This chapter explores the foundational concepts of Relational Databases (RDBMS), explaining how they guarantee data integrity through ACID properties and how Indexing provides lightning-fast search capabilities.

2. What is a Relational Database?

A Relational Database Management System (RDBMS) stores data in tables consisting of rows and columns.
  • Tables: Collections of related data (e.g., Users, Orders).
  • Rows (Records): A single entity (e.g., John Doe's user profile).
  • Columns (Fields): Attributes of the entity (e.g., Email, CreatedAt).
  • Primary Key: A unique identifier for every row (e.g., UserID: 123).

*Popular RDBMS:* MySQL, PostgreSQL, Oracle, SQL Server. *The defining feature:* Data is "Relational." An Order row contains a UserID (Foreign Key) that links it directly to the Users table.

3. ACID Properties (The Guarantee)

Relational databases are famous because they guarantee data validity despite errors, power failures, or network crashes. They do this via ACID properties.
  1. 1. Atomicity: Transactions are "all or nothing." If you transfer $100 from Account A to Account B, and the server crashes mid-transfer, Atomicity ensures the $100 isn't permanently deducted from Account A without being added to Account B. It rolls back the transaction entirely.
  1. 2. Consistency: The database ensures that any transaction brings the database from one valid state to another. Constraints (like "Account Balance cannot be negative") are enforced.
  1. 3. Isolation: Concurrent transactions occur as if they are executed sequentially. If User 1 and User 2 try to buy the last concert ticket at the exact same millisecond, Isolation ensures they don't both get it.
  1. 4. Durability: Once a transaction is committed, it will remain in the system, even in the event of a total system failure (e.g., power loss). Data is written to non-volatile storage (disk).

*Interview Tip:* If an interviewer asks you to design a Financial, Banking, or E-commerce billing system, you MUST choose a database with strict ACID compliance (usually an RDBMS).

4. Database Transactions

A transaction is a sequence of database operations treated as a single logical unit of work. *Example Workflow:*
  1. 1. BEGIN TRANSACTION;
  1. 2. Update User A's balance (-$100).
  1. 3. Update User B's balance (+$100).
  1. 4. COMMIT; (If everything succeeded, write to disk permanently).
  1. 5. ROLLBACK; (If any step failed, undo all changes).

5. Indexing (The Secret to Speed)

Imagine a library with 1 million books thrown randomly on the floor. If you want to find a book by Stephen King, you have to look at every single book one by one. This is a Full Table Scan (O(N) time complexity). It is devastatingly slow.

An Index is like the card catalog in the library. It creates an organized, sorted data structure (usually a B-Tree) that holds a copy of specific columns (e.g., AuthorName) and a pointer to the original row on the disk. Searching a B-Tree Index takes O(log N) time. It is lightning fast.

6. The Tradeoffs of Indexing

If indexes make searching so fast, why don't we index every single column in the database?
  • Slower Writes: Every time you INSERT, UPDATE, or DELETE a row, the database must also update every associated index.
  • Storage Space: Indexes consume additional hard drive space.
*Rule of Thumb:* Index columns that are frequently used in WHERE, ORDER BY, or JOIN clauses. Do not index columns that are frequently updated but rarely searched.

7. HR/Interview Perspective: The Schema Design Test

In an interview, you will be asked to draw the Database Schema. If asked to design an Instagram clone, you must draw the tables and show relationships:
  • Users (id, username, email)
  • Posts (id, userid, imageurl, createdat)
  • Follows (followerid, followeeid)
Interviewers evaluate if you understand normalization (reducing data redundancy) and how to design schemas that support efficient querying.

8. Real-World Scenario: The Missing Index

*Scenario:* An application has a Users table with 50 million rows. Users log in using their email address. The login API suddenly starts timing out, taking 15 seconds to respond. *The Problem:* The engineer forgot to put an index on the Email column. Every time a user tried to log in, the database was performing a Full Table Scan of 50 million rows to find the matching email. *The Fix:* The DBA runs CREATE INDEX idx
email ON Users(Email);. The query time instantly drops from 15 seconds to 2 milliseconds.

9. Mini Project: Design a Banking Schema

On a piece of paper, design the basic schema for a bank.
  1. 1. Create a Users table.
  1. 2. Create an Accounts table (checking, savings).
  1. 3. Create a Transactions table to track the movement of money.
*Hint: How do you link a transaction to a specific account? (Use a Foreign Key).*

10. Common Mistakes

  • Using strings as Primary Keys: Storing a 50-character string (like an email) as a primary key makes indexing and joining incredibly slow. Use auto-incrementing Integers or UUIDs as primary keys.
  • Over-indexing: Creating 15 indexes on a single table. Writes will grind to a halt because the database has to update 15 B-Trees for every single row insertion.

11. Best Practices

  • Normalization (Up to 3NF): Design your tables to avoid data duplication. If you have a customer's address, it should exist in one place, not repeated across 50 of their orders.
  • Soft Deletes: Instead of permanently deleting user data (DELETE FROM Users), add an is_deleted boolean column. This prevents breaking foreign key constraints and preserves historical data.

12. Exercises

  1. 1. Explain the "A" in ACID using the analogy of buying a cup of coffee.
  1. 2. Write a SQL query to find all users in a Users table who signed up after '2023-01-01'. Which column should be indexed to make this fast?

13. MCQs

Question 1

What defines a "Relational" database?

Question 2

What does the acronym ACID stand for in database systems?

Question 3

Which ACID property ensures that a transaction is "all or nothing," preventing partial updates if a crash occurs mid-transaction?

Question 4

If you are asked to design a high-stakes financial ledger or billing system, what type of database property is mandatory?

Question 5

What happens when a database performs a "Full Table Scan"?

Question 6

How does a Database Index improve search performance?

Question 7

What is the major tradeoff (downside) of adding too many Indexes to a table?

Question 8

Why is using an auto-incrementing Integer or UUID better for a Primary Key than using an Email Address?

Question 9

What does the "Durability" property of ACID guarantee?

Question 10

What is "Database Normalization"?

14. Interview Questions

  • Q: "You have a table with 100 million rows representing IoT sensor data. Your SELECT queries are timing out. Walk me through exactly how you would debug and fix this."

15. FAQs

  • Q: Are relational databases obsolete now that NoSQL exists?
A: Absolutely not. Relational databases (PostgreSQL, MySQL) still power the vast majority of the internet. They are the default choice until your specific scale or data structure explicitly demands NoSQL.

16. Summary

Relational Databases (RDBMS) organize data into tables and rely on strict schemas. They guarantee absolute data integrity through ACID properties, making them the default choice for financial and transactional systems. To prevent system-crashing Full Table Scans, engineers utilize Indexes (B-Trees) to drastically speed up read operations, understanding the tradeoff that indexes simultaneously slow down write operations.

17. Next Chapter Recommendation

While relational databases are robust, they struggle to scale horizontally across hundreds of servers. In Chapter 5: SQL vs NoSQL Databases, we will explore the alternative: schema-less databases, the CAP Theorem, and when to abandon ACID for massive scale.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·