GROUP BY and HAVING Clauses
# CHAPTER 11
GROUP BY and HAVING Clauses
1. Introduction
In Chapter 10, we learned that you cannot mix a standard column (likedepartment) with an aggregate function (like SUM(salary)). The database crashes because it doesn't know how to squash 1,000 names into a single sum. However, there is a magical workaround. What if we want the database to calculate the SUM(salary), but to calculate it *separately* for every unique department? This is where the GROUP BY clause comes in. It is the absolute backbone of all data science and business reporting.
2. Learning Objectives
By the end of this chapter, you will be able to:-
Resolve aggregate syntax errors using the
GROUP BYclause.
- Group data by a single column to generate statistical reports.
- Group data by multiple columns for deep analytics.
-
Understand why the
WHEREclause cannot filter aggregate data.
-
Use the
HAVINGclause to filterGROUP BYresults.
3. The GROUP BY Clause
The GROUP BY clause instructs PostgreSQL to group rows that have the exact same values into summary rows.
Let's solve the problem from the last chapter: We want a list of departments, and the total headcount of each department.
How it works:
- 1. PostgreSQL grabs the 1,000 employees.
-
2.
It groups them by the
departmentstring (e.g., it puts all the 'IT' rows in one bucket, all the 'HR' rows in another).
-
3.
It runs
COUNT(*)on each individual bucket.
- 4. *Output: 4 rows (IT: 500, HR: 50, Sales: 450).*
4. Grouping by Multiple Columns
You can group by more than one category! Suppose you want to know the average salary per department, but broken down further by job title.
5. The Limitation of the WHERE Clause
Let's say we run our department headcount query. We want to filter the results to only show departments that have more than 100 employees.
*Instinct says:*
SELECT dept, COUNT(*) FROM employees WHERE COUNT(*) > 100 GROUP BY dept;
THIS WILL CRASH!
Why? The WHERE clause runs *before* the GROUP BY happens. PostgreSQL cannot filter on a COUNT that hasn't been mathematically calculated yet! The WHERE clause strictly filters raw, individual rows.
6. The HAVING Clause
To solve this exact problem, SQL introduced the HAVING clause.
HAVING is basically a WHERE clause that is designed specifically to filter *after* the GROUP BY math has been completed.
7. The Ultimate Query Structure
Now you must memorize the strict, absolute order of execution for a complex SQL query. If you mix these up, the query fails.-
1.
SELECT(What columns/math to return)
-
2.
FROM(Which table)
-
3.
WHERE(Filter raw rows before math)
-
4.
GROUP BY(Sort rows into buckets)
-
5.
HAVING(Filter the buckets after math)
-
6.
ORDER BY(Sort the final output)
-
7.
LIMIT(Restrict final output)
8. Mini Project: E-commerce Reporting
Generate a report showing the total revenue for every Product Category, but ONLY include categories that generated over $50,000. Finally, sort it to show the most profitable category first.9. Common Mistakes
-
Forgetting columns in the GROUP BY: If you
SELECT columnA, columnB, SUM(price), you must haveGROUP BY columnA, columnB. Any column in theSELECTthat is NOT wrapped in an aggregate function MUST exist in theGROUP BYclause.
-
Using WHERE instead of HAVING: Trying to use
WHERE SUM(x) > 10is the most common beginner error. Always remember:WHEREis for rows,HAVINGis for groups.
10. Best Practices
-
Filter Early with WHERE: If you want to find the total sales of 'Electronics', do not
GROUP BYeverything, calculate all the math, and then useHAVING category = 'Electronics'. It is a massive waste of CPU. UseWHERE category = 'Electronics'*before* the group to discard irrelevant data immediately.
11. Exercises
-
1.
Write a query to find the total number of orders placed by each
user_id.
- 2. Modify the previous query to only show users who have placed *more than 5* total orders.
12. SQL Challenges
Write a query to find the lowest price in eachcategory, but only for categories where the absolute lowest price is still greater than $100.
13. MCQ Quiz with Answers
What is the fundamental difference between the WHERE clause and the HAVING clause?
If you execute: SELECT department, role, AVG(salary) FROM employees, what must the GROUP BY clause look like to prevent an error?
14. Interview Questions
-
Q: Explain the exact execution order of a SQL query containing
SELECT,FROM,WHERE,GROUP BY,HAVING, andORDER BY.
-
Q: If you want to calculate the total revenue of the 'Toys' category, would you filter for 'Toys' using the
WHEREclause or theHAVINGclause? Why (from a performance perspective)?
15. FAQs
Q: Can I group by a column I didn't SELECT? A: Yes.SELECT SUM(salary) FROM employees GROUP BY department; is valid. It will give you a list of numbers without the department names. However, it's terrible for reporting because you won't know which sum belongs to which department!
16. Summary
TheGROUP BY and HAVING clauses are the final puzzle pieces of single-table data extraction. You now possess the ability to write robust, dynamic queries that can compress millions of raw data points into categorized, highly filtered business intelligence reports.