Skip to main content
PostgreSQL
CHAPTER 16 Intermediate

Subqueries and Common Table Expressions (CTEs)

Updated: May 16, 2026
6 min read

# CHAPTER 16

Subqueries and Common Table Expressions (CTEs)

1. Introduction

Imagine the CEO asks: *"Give me a list of all employees who make more money than the company average."* You cannot type WHERE salary > AVG(salary) because aggregate functions are not allowed in the WHERE clause. You must first run a query to calculate the average, and then run a second query to find the employees. What if you could combine them? A Subquery is a query nested entirely inside another query. In this chapter, we will master nesting queries, and learn how to modernize them using Common Table Expressions (CTEs).

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Write basic Subqueries in the WHERE clause.
  • Understand the performance risks of Correlated Subqueries.
  • Use the EXISTS operator for lightning-fast checks.
  • Refactor messy Subqueries into clean WITH clauses (CTEs).
  • Understand how CTEs improve query readability.

3. The Basic Subquery

A subquery is wrapped in parentheses () and is executed *first* by PostgreSQL. The result is then fed into the outer query.

Let's solve the CEO's problem:

sql
123456
SELECT first_name, salary 
FROM employees 
WHERE salary > (
    -- The Subquery calculates the average first!
    SELECT AVG(salary) FROM employees
);

4. Subqueries with the IN Operator

Subqueries can return entire lists of data to be used with the IN operator.
sql
123456789
-- Find all users who have purchased a product in the 'Gaming' category
SELECT first_name, email 
FROM users 
WHERE id IN (
    SELECT user_id 
    FROM orders 
    JOIN products ON orders.product_id = products.id
    WHERE products.category = 'Gaming'
);

5. Correlated Subqueries (The Performance Killer)

In a standard subquery, the inner query runs exactly ONE time. A Correlated Subquery contains a reference to the outer query. This forces PostgreSQL to re-run the inner query *for every single row* in the outer table!
sql
12345678
-- For every single employee, calculate the average salary of THEIR specific department
SELECT first_name, salary, department 
FROM employees outer_emp
WHERE salary > (
    SELECT AVG(salary) 
    FROM employees inner_emp
    WHERE inner_emp.department = outer_emp.department -- The Correlation!
);

*(If you have 100,000 employees, this query executes the inner average calculation 100,000 times! Avoid these when possible).*

6. The EXISTS Operator

When you just need to know IF data exists (not what the data is), EXISTS is the fastest operator in PostgreSQL. It stops searching the millisecond it finds a single match.
sql
123456
-- Find users who have at least one order.
-- EXISTS is infinitely faster than IN for massive tables.
SELECT first_name FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id
);

7. Common Table Expressions (CTEs)

Subqueries get incredibly ugly and hard to read when deeply nested. Modern PostgreSQL developers use CTEs (the WITH clause). A CTE acts like a temporary, virtual table that exists only for the millisecond your query is running. It allows you to write the subquery at the very top, give it a name, and cleanly select from it later.
sql
1234567
-- Rewrite the CEO's query using a CTE!
WITH CompanyStats AS (
    SELECT AVG(salary) AS avg_sal FROM employees
)
SELECT first_name, salary 
FROM employees, CompanyStats -- We can select from the CTE!
WHERE salary > CompanyStats.avg_sal;

8. Mini Project: Complex Reporting with CTEs

A manager wants to see a list of Departments, their Total Revenue, and ONLY include departments that made over $100,000. CTEs make complex grouping highly readable.
sql
1234567891011
WITH DepartmentSales AS (
    -- Step 1: Calculate the gross sales for every department
    SELECT department_id, SUM(amount) AS total_sales
    FROM sales
    GROUP BY department_id
)
-- Step 2: Query our new virtual table and join the real names!
SELECT d.dept_name, ds.total_sales
FROM departments d
JOIN DepartmentSales ds ON d.dept_id = ds.department_id
WHERE ds.total_sales > 100000;

9. Common Mistakes

  • Returning Multiple Columns: A subquery used with an = or > operator MUST return exactly ONE column and ONE row. If your subquery returns (firstname, lastname), the outer query will crash because it cannot do math against two columns simultaneously.
  • Overusing Subqueries instead of JOINs: Many beginners use subqueries to fetch related data. In almost all cases, an INNER JOIN is significantly faster and better optimized by the PostgreSQL engine.

10. Best Practices

  • Embrace CTEs: In modern data engineering, deeply nested subqueries are considered bad practice due to poor readability. If your query requires nesting, extract it into a WITH CTE block at the top of the file.

11. Exercises

  1. 1. Write a query using a subquery to find all products that have a price greater than the absolute MAX(price) found in the 'Toys' category.
  1. 2. Explain why the EXISTS operator is faster than IN when dealing with massive datasets.

12. SQL Challenges

Convert the following nested subquery into a clean CTE using the WITH clause:
sql
1
SELECT * FROM users WHERE id IN (SELECT user_id FROM banned_accounts);

Solution:

sql
12345
WITH BannedUsers AS (
    SELECT user_id FROM banned_accounts
)
SELECT * FROM users 
WHERE id IN (SELECT user_id FROM BannedUsers);

13. MCQ Quiz with Answers

Question 1

What is the defining characteristic of a "Correlated Subquery"?

Question 2

What is the primary benefit of using a Common Table Expression (CTE) with the WITH clause instead of a traditional nested subquery?

14. Interview Questions

  • Q: Explain the performance differences between using IN versus EXISTS when filtering against a subquery that returns millions of rows.
  • Q: What is a Recursive CTE in PostgreSQL, and in what specific data structure (e.g., an organizational chart) is it absolutely necessary?

15. FAQs

Q: Can I have multiple CTEs in one query? A: Yes! You can chain them using commas: WITH CTE1 AS (...), CTE2 AS (...) SELECT * FROM CTE1 JOIN CTE2...

16. Summary

Query nesting unlocks ultimate flexibility. By utilizing Subqueries, you can compute dynamic thresholds on the fly. By understanding the performance impacts of Correlated queries and EXISTS, you write faster software. Most importantly, by adopting CTEs, you elevate your SQL from messy nested scripts into clean, professional data pipelines.

17. Next Chapter Recommendation

A CTE creates a virtual table that lasts for a millisecond. What if we want that virtual table to last forever so other developers can query it? In Chapter 17: Views and Materialized Views, we will learn how to permanently save our most complex SQL architectures into the database.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·