Indexing and Query Optimization
# CHAPTER 18
Indexing and Query Optimization
1. Introduction
If you are looking for the word "PostgreSQL" in a 1,000-page textbook, you don't start at page 1 and read every single word. You go to the Index at the back of the book, find the "P", look up the page number, and instantly flip to it. Databases work exactly the same way. If an application is running slowly, 99% of the time it is because the database is missing an Index. In this chapter, we will master Query Optimization to transform queries that take 5 seconds into queries that take 5 milliseconds.2. Learning Objectives
By the end of this chapter, you will be able to:- Understand the catastrophic performance of a "Full Table Scan".
-
Create standard B-Tree Indexes using
CREATE INDEX.
- Identify the correct columns to index (Foreign Keys, search fields).
- Create Composite Indexes for multi-column queries.
-
Use
EXPLAIN ANALYZEto diagnose slow queries.
- Understand GIN indexes for JSONB data.
3. The Problem: Full Table Scans
Imagine ausers table with 5 million rows. A user tries to log in. The backend executes:
SELECT * FROM users WHERE email = 'john@example.com';
Because the database doesn't know where John is, it must start at row 1 and read every single row until it finds him. If he is row 4,999,999, this is called a Full Table Scan. It is incredibly slow, burns massive CPU power, and will cause your application to crash under heavy traffic.
4. Creating a B-Tree Index
To fix this, we instruct PostgreSQL to create an Index on theemail column.
How it works: PostgreSQL creates a highly optimized data structure called a B-Tree (Balanced Tree). Now, when you search for "john", PostgreSQL jumps straight to the "J" section of the tree, instantly grabs the exact hard-drive location of the row, and fetches it in 0.001 seconds.
5. What Columns Should You Index?
You do not index every column. Indexes consume hard drive space, and they slightly slow downINSERT commands (because the tree must be re-sorted every time new data is added).
Only Index:
-
1.
Columns frequently used in
WHEREclauses (like emails, usernames).
-
2.
Foreign Keys! (If
ordershas auser_id, index it! This makesJOINs lightning fast).
-
3.
Columns frequently used in
ORDER BY.
*(Note: PostgreSQL automatically creates an index on every PRIMARY KEY and UNIQUE column).*
6. Composite Indexes
If a query frequently searches using two columns together, you can create a Composite Index.
7. Diagnosing Queries: EXPLAIN ANALYZE
How do you know if your query is doing a Full Table Scan? You ask PostgreSQL to explain its mechanical thought process using EXPLAIN ANALYZE.
The output will show you the exact "Query Plan".
-
If it says
Seq Scan(Sequential Scan), it is doing a slow Full Table Scan.
-
If it says
Index Scan, congratulations! Your index is working perfectly.
8. Advanced: GIN Indexes (For JSONB and Text Search)
Standard B-Tree indexes are great for exact matches (like email or IDs). But what if you are using PostgreSQL's advancedJSONB data type, and you want to search for a specific key hidden deep inside a massive JSON document?
A B-Tree cannot look inside a JSON object. You must use a GIN Index (Generalized Inverted Index).
9. Common Mistakes
-
Over-Indexing: Creating an index on a boolean column like
isactive(which only has two possible values: True or False) is a massive waste. An index is only useful if it narrows down the results significantly (called "high cardinality").
-
Leading Wildcards destroy Indexes: As discussed in Chapter 8, running
WHERE email ILIKE '%john%'completely bypasses the B-Tree index and forces a Sequential Scan. An index can only help if the search is exact, or starts at the beginning of the string ('john%').
10. Best Practices
-
Drop Unused Indexes: Periodically review your indexes. If an index is no longer used by the application, drop it. It is consuming RAM, Disk Space, and slowing down
INSERToperations.
11. Exercises
-
1.
Write the SQL command to create an index on the
productnamecolumn in theproductstable.
-
2.
Why is it unnecessary to write
CREATE INDEXfor a column defined as aPRIMARY KEY?
12. SQL Challenges
You notice thatSELECT * FROM orders WHERE user_id = 45 AND status = 'Pending' is running very slowly. Write the DDL command to create the optimal Composite Index to fix this query.
13. MCQ Quiz with Answers
What is the primary negative side-effect of creating too many Indexes on a table?
What SQL command do you prepend to a SELECT query to view the underlying execution plan (e.g., to see if it is performing a Seq Scan or an Index Scan)?
14. Interview Questions
- Q: Explain the mechanical difference between a Sequential Scan (Full Table Scan) and an Index Scan in PostgreSQL.
-
Q: If you have an index on
(lastname, firstname), will that index be utilized by the database if a query only searchesWHERE firstname = 'John'? Why or why not?
15. FAQs
Q: How does a Composite Index work? A: Think of it like a phone book. A phone book is indexed by(lastname, first_name). It is incredibly fast to find all the "Smiths", and then find the "Johns" within the Smiths. However, if you just try to look up "John", the phone book is useless because it is sorted by last name first!
16. Summary
Indexing is the black magic of database engineering. By strategically applying B-Tree indexes to your Foreign Keys and heavily queried search columns, you prevent catastrophic Sequential Scans. UsingEXPLAIN ANALYZE, you can confidently verify that your application will remain blisteringly fast even as it scales to billions of rows.