SQL GROUP BY and HAVING | Data Bucketing and Aggregation
# CHAPTER 13
GROUP BY and HAVING Clauses
1. Introduction
In Chapter 12, we learned that mixing a raw column (likedepartment) with an aggregate function (like SUM(salary)) causes a fatal crash. But what if the HR Director specifically asks you for "The total salary expense *per department*"? To solve this, you must tell the database to take all 5,000 employees, bucket them into their specific departments, and *then* run the math on each bucket individually. This architectural masterpiece is achieved using the GROUP BY clause.
2. Learning Objectives
By the end of this chapter, you will be able to:- Successfully mix raw columns and aggregate functions.
-
Understand how
GROUP BYbuckets data.
- Perform grouped mathematical calculations.
-
Filter aggregated results using the
HAVINGclause.
-
Differentiate between the functionality of
WHEREandHAVING.
3. The GROUP BY Clause
The GROUP BY clause is placed immediately after the WHERE clause. It commands SQL to gather all rows that share identical values in a specific column and compress them into a single summary row.
*(Result: Instead of a crash, SQL creates 3 buckets (HR, IT, Sales), adds the salaries inside each bucket, and returns exactly 3 perfect rows!)*
4. Grouping by Multiple Columns
You can group by more than one dimension. If you want to see the total sales per region, but also broken down by year, you simply list both columns.*(Result: It generates unique buckets for "North-2022", "North-2023", "South-2022", etc.)*
5. Filtering Aggregated Data (The HAVING Clause)
The HR Director looks at your report and says: *"Great, but I ONLY want to see departments where the Total Payroll is greater than $1,000,000."*
You might try to write this:
SELECT department, SUM(salary) FROM employees WHERE SUM(salary) > 1000000 GROUP BY department;
CRASH!
The WHERE clause filters individual rows *before* the math happens. You cannot use WHERE on an aggregate function!
To filter data *after* the math has been calculated, you must use the HAVING clause.
6. The Complete Order of Execution
Understanding the strict order in which SQL executes your query is what separates a junior from a senior developer.-
1.
FROM: Gather the tables.
-
2.
WHERE: Filter the raw, individual rows.
-
3.
GROUP BY: Bucket the remaining rows.
-
4.
HAVING: Filter the aggregated buckets.
-
5.
SELECT: Format the output columns.
-
6.
ORDER BY: Sort the final result.
-
7.
LIMIT: Cut off the output grid.
7. Mini Project: The Marketing Report
Let's build a complex report for the Marketing team. They want to know the Total Number of Users registered per Country, but only for active users, only showing countries with more than 500 users, sorted highest to lowest.8. Common Mistakes
-
The "Select List" Rule Violation: The most common error in all of SQL! If you put a raw column in your
SELECTstatement (e.g.,SELECT state, city, COUNT(*)), you MUST include *all* raw columns in yourGROUP BY.
GROUP BY state; (SQL doesn't know what to do with the city!)
*Correct:* GROUP BY state, city;
9. Best Practices
-
Alias the Math, but group by the Raw Formula: In some database engines (like strict versions of PostgreSQL), you cannot use your
SELECTaliases inside theHAVINGclause. It is safer to rewrite the math.
HAVING SUM(revenue) > 1000 rather than HAVING 'Total Sales' > 1000.
10. Exercises
-
1.
What clause is strictly required if you want to mix a raw dimension column with an aggregate function in your
SELECTstatement?
-
2.
What is the explicit architectural difference between the
WHEREclause and theHAVINGclause?
11. SQL Challenges
Write a query to group theproducts table by category. Calculate the Average (AVG) price for each category. Only return categories where the Average Price is strictly greater than 50.
12. MCQ Quiz with Answers
You are tasked with finding the total number of employees in each department. Which SQL syntax is structurally correct?
Why will the query SELECT city, SUM(sales) FROM stores WHERE SUM(sales) > 10000 GROUP BY city; fail to execute?
13. Interview Questions
-
Q: Detail the chronological Execution Order of a complex SQL query containing
SELECT,FROM,WHERE,GROUP BY,HAVING, andORDER BY. Explain why the database engine processes them in that specific order.
- Q: A junior developer writes a query selecting 4 different categorical columns and 1 aggregate function, but only groups by 1 column. Explain the "Select List" rule violation and why the database engine physically cannot process the request.
14. FAQs
Q: Can I useGROUP BY without an aggregate function?
A: Yes! SELECT city FROM users GROUP BY city; works perfectly. It will actually return a unique list of cities, acting exactly like the DISTINCT keyword! (Though using DISTINCT is cleaner for humans to read).
15. Summary
You have mastered the most complex logic in single-table analytics. By understanding the strict execution order of SQL, and wielding theGROUP BY and HAVING clauses, you can compress millions of raw data points into mathematically grouped, beautifully filtered business intelligence reports.