Aggregate Functions in SQL
# CHAPTER 12
Aggregate Functions in SQL
1. Introduction
Thus far, everySELECT query we have written operates on an individual, row-by-row basis. If you ask for prices, SQL hands you a massive list of prices. But what if you don't want the raw data? What if you want the *answers*? You want the Total Revenue, the Average Age, or the Number of Users online. To process massive datasets into a single, summarized mathematical answer, we use Aggregate Functions. In this chapter, we step into the world of Data Analytics.
2. Learning Objectives
By the end of this chapter, you will be able to:- Understand the concept of data aggregation.
-
Count rows and occurrences using
COUNT().
-
Calculate totals using
SUM().
-
Find mathematical averages using
AVG().
-
Extract the smallest and largest values using
MIN()andMAX().
3. The COUNT() Function
The most frequently used aggregate function. It simply counts the number of rows that match your query.
If you use COUNT(*), it counts every single row in the table, regardless of what data is inside.
*(Result: Instead of returning 5,000 rows of Texas users, it returns a single row with the number 5000!)*
4. The SUM() Function
If you are running an E-commerce store, COUNT() tells you *how many* items were sold. But SUM() tells you *how much money* you made. SUM() adds up all the numeric values in a specific column.
5. The AVG() Function
The AVG() function adds up all the values in a column and divides by the total number of rows, providing the mathematical mean.
*(Tip: AVG() often returns crazy decimals like 55400.333333. You can wrap it in a ROUND() function to clean it up! e.g., ROUND(AVG(salary), 2)).*
6. The MIN() and MAX() Functions
These functions scan an entire column and extract the absolute lowest (MIN) or absolute highest (MAX) value. They work on Numbers, Dates, and even Strings (Alphabetically).
7. Combining Aggregate Functions (The Dashboard Query)
You can place multiple aggregate functions inside a singleSELECT statement to generate a complete business intelligence report in milliseconds.
8. The Danger of Mixing Aggregates with Raw Columns
There is a massive, critical rule in SQL: You cannot mix Aggregate Functions with raw columns unless you are Grouping them.*Crash Example:*
SELECT firstname, COUNT(*) FROM users;
Think about it logically. COUNT(*) returns a single number (e.g., 5000). But firstname returns 5,000 separate names. SQL cannot squish 5,000 names into a single row. The database will throw a fatal syntax error.
9. Mini Project: Sales Analytics
Let's analyze theinventory table.
10. Common Mistakes
-
COUNT(column) vs COUNT(*):
COUNT(*)counts rows.COUNT(email)counts *how many rows have an email address*. If 10 users exist, but 2 haveNULLemails,COUNT(*)returns 10, butCOUNT(email)returns 8! Aggregate functions (exceptCOUNT(*)) completely ignoreNULLvalues.
11. Best Practices
-
Alias Everything: Aggregate function outputs have extremely ugly column names by default (e.g.,
SUM(price*qty)). Always useAS 'Clean Name'to format the output for the backend developers who have to read your JSON data.
12. Exercises
- 1. What function is used to calculate the total mathematical sum of a numerical column?
-
2.
Why will the query
SELECT username, MAX(score) FROM players;fail to execute in standard SQL?
13. SQL Challenges
Write a single query to analyze theproperties table for a real estate firm. Calculate the Total number of active listings (COUNT), the Lowest price (MIN), and the Highest price (MAX). Ensure the property status is 'Active'. Use Aliases for all outputs.
14. MCQ Quiz with Answers
What is the primary functional difference between COUNT(*) and COUNT(phonenumber) in an SQL query?
Why is the SQL statement SELECT firstname, SUM(salary) FROM employees; architecturally invalid?
15. Interview Questions
-
Q: An executive asks for the Average Sales Value. You write
SELECT AVG(saleamount) FROM sales;. However, 100 sales records were entered incorrectly withNULLsale amounts. Explain how theAVG()function handles theseNULLvalues mathematically. Does it treat them as zeros, or ignore them entirely?
-
Q: Explain the fatal logic error in attempting to mix raw dimensional columns with aggregate functions in the
SELECTstatement.
16. FAQs
Q: Are aggregate functions slow on massive tables? A: Yes, if the table has millions of rows, runningSUM() requires the database engine to scan every single row and do math. To optimize this, massive enterprise applications often calculate the sum once a night and store it in a separate "reporting" table.
17. Summary
You are now an analyst. By wielding the Big Five aggregate functions—COUNT, SUM, AVG, MIN, and MAX—you can condense millions of rows of chaotic raw data into singular, actionable points of business intelligence.