GROUP BY and HAVING Clauses
# CHAPTER 11
GROUP BY and HAVING Clauses
1. Introduction
In the last chapter, we learned that mixing a standard column (department) with an aggregate function (SUM(salary)) causes MySQL to crash. But what if your boss specifically asks: *"I want a list of every department, and the total salary cost next to each one."* To achieve this, you must instruct MySQL to categorize the data *before* running the math. In this chapter, we introduce the GROUP BY clause—the undisputed champion of data reporting and analytics.
2. Learning Objectives
By the end of this chapter, you will be able to:-
Understand how
GROUP BYsplits data into categories.
- Safely combine standard columns with aggregate functions.
-
Differentiate between the
WHEREclause and theHAVINGclause.
- Write complex, multi-layered reporting queries.
3. The GROUP BY Clause
The GROUP BY clause tells MySQL: *"Before you do the SUM() math, group all the identical rows together into buckets. Then, run the math separately on each bucket!"*
*Why this works:* Because we specified GROUP BY department, MySQL knows it is safe to print the department column alongside the aggregate function, because the math is now directly tied to that specific category!
4. Grouping by Multiple Columns
You can create subgroups. Imagine you want to know how many employees exist in each department, broken down by whether they are a manager or not.
5. Filtering Groups: The HAVING Clause
Now, the boss says: *"Give me the payroll per department, but ONLY show me departments whose payroll is greater than $300,000."*
Your instinct is to use the WHERE clause:
WHERE SUM(salary) > 300000
THIS WILL CRASH.
The WHERE clause runs *before* the math happens. It filters raw, individual rows. It cannot filter aggregate totals because the totals haven't been calculated yet!
To filter aggregated groups, you MUST use the HAVING clause.
6. Mixing WHERE and HAVING
You can use both in the same query! They just execute at different times.
-
WHEREfilters the raw rows *before* grouping.
-
HAVINGfilters the math totals *after* grouping.
7. Mini Project: E-commerce Reporting
You need to generate a report showing the best-selling product categories, but only for categories that have sold more than 50 total items.*(Notice the order: SELECT -> FROM -> WHERE -> GROUP BY -> HAVING -> ORDER BY).*
8. Common Mistakes
-
Selecting Ungrouped Columns: If you write
SELECT department, firstname, SUM(salary) GROUP BY department;, it will fail. You grouped bydepartment, but asked forfirstname. MySQL has 50 different first names in the IT bucket; it doesn't know which one to print. Rule: Every column in yourSELECTmust either be inside an Aggregate Function, or listed in theGROUP BYclause.
9. Best Practices
-
Performance:
GROUP BYrequires MySQL to create temporary tables in memory to sort and calculate the buckets. On massive datasets, this can be slow. Always try to filter out as much data as possible using aWHEREclause *before* theGROUP BYexecutes!
10. Exercises
-
1.
Write a query to find the average (
AVG) salary for eachdepartment.
-
2.
Why does the query
SELECT department, COUNT(*) FROM employees WHERE COUNT(*) > 10 GROUP BY department;result in a syntax error?
11. MCQ Quiz with Answers
What is the explicit purpose of the HAVING clause?
If you want to list the total number of users registered in each country, what clause MUST you include to prevent a syntax error?
12. Interview Questions
-
Q: Explain the strict difference in execution order and purpose between the
WHEREclause and theHAVINGclause.
-
Q: What is the SQL rule regarding selecting non-aggregated columns alongside an aggregate function? How does
GROUP BYresolve this?
13. FAQs
Q: Can I use Aliases in theHAVING clause?
A: Yes! In modern MySQL, you can write HAVING total_payroll > 300000 instead of repeating HAVING SUM(salary) > 300000. However, you CANNOT use aliases in the WHERE clause, because WHERE executes before the alias is created!
14. Summary
TheGROUP BY and HAVING clauses transform raw databases into sophisticated analytical engines. By bucketing rows into distinct categories and filtering the resulting mathematics, you can generate the exact high-level reports required by business intelligence tools and dashboards.