Skip to main content
Database Design
CHAPTER 14 Beginner

Database Indexing Explained | B-Trees and Optimization

Updated: May 16, 2026
20 min read

# CHAPTER 14

Indexing and Query Performance

1. Introduction

When your database has 100 rows, every query executes instantly. But when your user table hits 10,000,000 rows, a simple query like SELECT * FROM users WHERE email = 'john@gmail.com'; might take 5 full seconds. In software engineering, a 5-second delay causes users to close the app. To solve this, Database Architects utilize Indexes. In this chapter, we will learn how to transform sluggish, crashing queries into lightning-fast operations capable of handling massive enterprise scale.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Understand the catastrophic mechanics of a "Full Table Scan".
  • Create a B-Tree Index to massively accelerate searches.
  • Create Composite Indexes for multi-column queries.
  • Understand the performance trade-offs (The Cost of Indexing).
  • Use the EXPLAIN keyword to diagnose slow queries.

3. The Problem: The Full Table Scan

Imagine a physical library with 10 million books, completely unorganized. If I ask you to find the book "Dune", you must start at book #1 and manually check every single book until you find it. This is a Full Table Scan. It is exactly how SQL searches an unindexed table by default. It is incredibly slow and maxes out the server's CPU.

4. The Solution: The B-Tree Index

A Database Index is exactly like the Index at the back of a textbook. It is an alphabetically (or numerically) sorted list that points directly to the exact physical hard drive sector the data lives on. (Behind the scenes, Relational databases use a complex mathematical structure called a B-Tree to organize this list).

If we frequently search our application by email, we must explicitly build an Index on that column!

sql
123
-- DDL Command: Create an Index on the 'email' column
CREATE INDEX idx_user_email 
ON users (email);

*(Now, when a user logs in, SQL bypasses the 10-million row table, instantly checks the B-Tree index, and finds the exact row in 2 milliseconds!)*

5. Composite Indexes (Multi-Column)

If your application frequently filters by two things simultaneously (e.g., "Find active users in New York"), a single index is not enough. You can create a Composite Index spanning multiple columns.
sql
1234567
-- Create an Index combining state and status!
CREATE INDEX idx_state_status 
ON users (state, status);

-- This query is now lightning fast!
SELECT * FROM users 
WHERE state = 'NY' AND status = 'Active';

6. The Cost of Indexing (Why not index everything?)

If Indexes make reading data instantly fast, why don't we just index every single column in the database? Because Indexes destroy WRITE speed.

Every time you execute an INSERT, UPDATE, or DELETE, the database not only has to write the row into the main table, but it must mathematically recalculate and re-sort *every single B-Tree Index* attached to that table. Furthermore, Indexes consume massive amounts of physical Hard Drive space and RAM. *The Golden Rule:* Only index columns that are frequently used in WHERE, JOIN, or ORDER BY clauses.

7. Diagnosing Slow Queries (EXPLAIN)

How do you know if your query is slow because of a missing index, or because of a poorly written JOIN? You use the EXPLAIN keyword. If you put EXPLAIN in front of any query, SQL does *not* run the query. Instead, it gives you the Execution Plan—it tells you exactly how it *plans* to run the query.
sql
1
EXPLAIN SELECT * FROM users WHERE email = 'test@test.com';

*(If the EXPLAIN output says "Type: ALL" or "Seq Scan", the database is performing a Full Table Scan! You need an index. If it says "Index Scan", your database is fully optimized!)*

8. Auto-Indexed Columns

You do not need to manually run CREATE INDEX on your id column. When you define a PRIMARY KEY or a UNIQUE constraint, the database engine automatically generates a blazing-fast index for it in the background!

9. Common Mistakes

  • Indexing Low Cardinality Columns: "Cardinality" means uniqueness. An email column has High Cardinality (every row is different). A gender column has Low Cardinality (mostly M/F). Creating a B-Tree index on a Low Cardinality column is functionally useless, because the index still has to return 50% of the entire table. The query optimizer will likely ignore the index entirely.

10. Best Practices

  • Left-to-Right Composite Rule: When using a Composite Index (e.g., (state, city)), the order of the columns matters! The index can optimize WHERE state = 'NY', or WHERE state = 'NY' AND city = 'Albany'. But it CANNOT optimize WHERE city = 'Albany' alone. A composite index only works left-to-right!

11. Exercises

  1. 1. What does the database engine perform when it searches a massive table that lacks an Index?
  1. 2. What DDL command is used to physically create an index on a specific column?

12. Database Design Challenges

You are the DBA for an E-Commerce platform. The query SELECT * FROM orders WHERE userid = 99 AND orderstatus = 'Shipped'; is crashing the server because the orders table has 50 million rows. Write the SQL command to create a Composite Index named idxuserstatus to fix this query instantly. *(Answer: CREATE INDEX idxuserstatus ON orders (userid, orderstatus);)*

13. MCQ Quiz with Answers

Question 1

Why is it considered a severe architectural anti-pattern to create an Index on every single column in a massive database table?

Question 2

When prefixing a query with the EXPLAIN keyword (e.g., EXPLAIN SELECT * FROM users), what output does the database engine provide to the architect?

14. Interview Questions

  • Q: Explain the mechanical structure of a B-Tree Index. Contrast the performance characteristics of a query executing an "Index Scan" versus a query executing a "Sequential Full Table Scan".
  • Q: A developer creates a composite index on (lastname, firstname). They then run the query SELECT * FROM users WHERE first_name = 'John';. Explain why the database engine is completely unable to utilize the index for this specific query.

15. FAQs

Q: Do I need to manually update the Index when I insert new data? A: No! This is the magic of Relational Databases. The database engine automatically recalculates and reorganizes the B-Tree Index invisibly in the background every single time a row is inserted, updated, or deleted.

16. Summary

You are now a Performance Optimization Architect. By diagnosing execution plans with EXPLAIN and strategically implementing B-Tree and Composite Indexes, you can transform an application that buckles under the weight of 10 million rows into a hyper-efficient system that processes data in milliseconds.

17. Next Chapter Recommendation

Indexes make searches fast, but what happens when the total size of your database hits 5 Terabytes and simply cannot fit on a single physical server anymore? In Chapter 15: Designing Scalable Database Architectures, we will move beyond single-server design and explore Sharding and distributed systems.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·