Skip to main content
PostgreSQL
CHAPTER 11 Intermediate

GROUP BY and HAVING Clauses

Updated: May 16, 2026
6 min read

# CHAPTER 11

GROUP BY and HAVING Clauses

1. Introduction

In Chapter 10, we learned that you cannot mix a standard column (like department) with an aggregate function (like SUM(salary)). The database crashes because it doesn't know how to squash 1,000 names into a single sum. However, there is a magical workaround. What if we want the database to calculate the SUM(salary), but to calculate it *separately* for every unique department? This is where the GROUP BY clause comes in. It is the absolute backbone of all data science and business reporting.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Resolve aggregate syntax errors using the GROUP BY clause.
  • Group data by a single column to generate statistical reports.
  • Group data by multiple columns for deep analytics.
  • Understand why the WHERE clause cannot filter aggregate data.
  • Use the HAVING clause to filter GROUP BY results.

3. The GROUP BY Clause

The GROUP BY clause instructs PostgreSQL to group rows that have the exact same values into summary rows.

Let's solve the problem from the last chapter: We want a list of departments, and the total headcount of each department.

sql
123
SELECT department, COUNT(*) AS "Total Employees"
FROM employees
GROUP BY department;

How it works:

  1. 1. PostgreSQL grabs the 1,000 employees.
  1. 2. It groups them by the department string (e.g., it puts all the 'IT' rows in one bucket, all the 'HR' rows in another).
  1. 3. It runs COUNT(*) on each individual bucket.
  1. 4. *Output: 4 rows (IT: 500, HR: 50, Sales: 450).*

4. Grouping by Multiple Columns

You can group by more than one category! Suppose you want to know the average salary per department, but broken down further by job title.
sql
1234
SELECT department, job_title, AVG(salary) AS "Avg Salary"
FROM employees
GROUP BY department, job_title
ORDER BY department ASC;

5. The Limitation of the WHERE Clause

Let's say we run our department headcount query. We want to filter the results to only show departments that have more than 100 employees.

*Instinct says:* SELECT dept, COUNT(*) FROM employees WHERE COUNT(*) > 100 GROUP BY dept; THIS WILL CRASH!

Why? The WHERE clause runs *before* the GROUP BY happens. PostgreSQL cannot filter on a COUNT that hasn't been mathematically calculated yet! The WHERE clause strictly filters raw, individual rows.

6. The HAVING Clause

To solve this exact problem, SQL introduced the HAVING clause. HAVING is basically a WHERE clause that is designed specifically to filter *after* the GROUP BY math has been completed.
sql
12345
-- Find departments with massive payrolls
SELECT department, SUM(salary) AS "Total Payroll"
FROM employees
GROUP BY department
HAVING SUM(salary) > 1000000;

7. The Ultimate Query Structure

Now you must memorize the strict, absolute order of execution for a complex SQL query. If you mix these up, the query fails.
  1. 1. SELECT (What columns/math to return)
  1. 2. FROM (Which table)
  1. 3. WHERE (Filter raw rows before math)
  1. 4. GROUP BY (Sort rows into buckets)
  1. 5. HAVING (Filter the buckets after math)
  1. 6. ORDER BY (Sort the final output)
  1. 7. LIMIT (Restrict final output)

8. Mini Project: E-commerce Reporting

Generate a report showing the total revenue for every Product Category, but ONLY include categories that generated over $50,000. Finally, sort it to show the most profitable category first.
sql
123456
SELECT category, SUM(price * quantity) AS total_revenue
FROM orders
WHERE order_date >= '2023-01-01' -- Only look at this year (Filters Raw Rows)
GROUP BY category                -- Bucket by Category
HAVING SUM(price * quantity) > 50000 -- Filter out the poor performing buckets
ORDER BY total_revenue DESC;     -- Sort highest to lowest

9. Common Mistakes

  • Forgetting columns in the GROUP BY: If you SELECT columnA, columnB, SUM(price), you must have GROUP BY columnA, columnB. Any column in the SELECT that is NOT wrapped in an aggregate function MUST exist in the GROUP BY clause.
  • Using WHERE instead of HAVING: Trying to use WHERE SUM(x) > 10 is the most common beginner error. Always remember: WHERE is for rows, HAVING is for groups.

10. Best Practices

  • Filter Early with WHERE: If you want to find the total sales of 'Electronics', do not GROUP BY everything, calculate all the math, and then use HAVING category = 'Electronics'. It is a massive waste of CPU. Use WHERE category = 'Electronics' *before* the group to discard irrelevant data immediately.

11. Exercises

  1. 1. Write a query to find the total number of orders placed by each user_id.
  1. 2. Modify the previous query to only show users who have placed *more than 5* total orders.

12. SQL Challenges

Write a query to find the lowest price in each category, but only for categories where the absolute lowest price is still greater than $100.
sql
1234
SELECT category, MIN(price) AS lowest_price
FROM products
GROUP BY category
HAVING MIN(price) > 100;

13. MCQ Quiz with Answers

Question 1

What is the fundamental difference between the WHERE clause and the HAVING clause?

Question 2

If you execute: SELECT department, role, AVG(salary) FROM employees, what must the GROUP BY clause look like to prevent an error?

14. Interview Questions

  • Q: Explain the exact execution order of a SQL query containing SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY.
  • Q: If you want to calculate the total revenue of the 'Toys' category, would you filter for 'Toys' using the WHERE clause or the HAVING clause? Why (from a performance perspective)?

15. FAQs

Q: Can I group by a column I didn't SELECT? A: Yes. SELECT SUM(salary) FROM employees GROUP BY department; is valid. It will give you a list of numbers without the department names. However, it's terrible for reporting because you won't know which sum belongs to which department!

16. Summary

The GROUP BY and HAVING clauses are the final puzzle pieces of single-table data extraction. You now possess the ability to write robust, dynamic queries that can compress millions of raw data points into categorized, highly filtered business intelligence reports.

17. Next Chapter Recommendation

Up until now, we have only queried a single table at a time. But Relational Databases are built on *Relationships*. To extract data that spans across multiple tables, we must deeply understand how Tables are mathematically linked. In Chapter 12: Primary Keys, Unique Keys, and Sequences, we will harden our table structures before learning to join them.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·