Skip to main content
MongoDB
CHAPTER 15 Beginner

Aggregation Framework Basics

Updated: May 16, 2026
15 min read

# CHAPTER 15

Aggregation Framework Basics

1. Introduction

The find() method is great for retrieving data, but it is terrible at analyzing data. If you want to calculate the total sales revenue for a specific month, you cannot use find(). In SQL, you would use GROUP BY and SUM(). In MongoDB, you use the most powerful analytical engine ever built: the Aggregation Framework. In this chapter, we will learn how to build data "Pipelines" to crunch massive amounts of NoSQL data in milliseconds.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Understand the concept of an Aggregation Pipeline.
  • Use the aggregate() method.
  • Filter data flowing into the pipeline using the $match stage.
  • Crunch data (Sum, Average, Count) using the $group stage.
  • Shape the final output using the $project stage.

3. What is an Aggregation Pipeline?

Think of the Aggregation Framework as an assembly line in a factory. You take a bucket of 1,000 raw documents and push them onto a conveyor belt (the Pipeline). The documents pass through different "Stages":
  1. 1. Stage 1: Filter out the bad ones. (Now we have 500 documents).
  1. 2. Stage 2: Group them by category and calculate the total price. (Now we have 5 summary documents).
  1. 3. Stage 3: Sort them from highest to lowest.
  1. 4. Output: The final analytics report!

4. The $match Stage (The Filter)

The pipeline is executed using the aggregate() method, which accepts an Array [] of stages. The $match stage acts exactly like the find() query. It is almost always the first stage, used to filter out garbage data before doing heavy math.
javascript
1234
// Pipeline Stage 1: Only allow "Completed" orders onto the conveyor belt
db.orders.aggregate([
    { $match: { status: "Completed" } }
])

5. The $group Stage (The Calculator)

This is where the magic happens. $group acts like SQL's GROUP BY. It takes the documents, groups them by a specific field (using the mandatory _id key), and performs mathematical operations ($sum, $avg, $max, $min) on them.
javascript
12345678
// Calculate Total Revenue grouped by User
db.orders.aggregate([
    // Stage 1: Group by the user_id (You must prefix the field with a '$' inside strings!)
    { $group: { 
        _id: "$user_id",               // Group by User
        total_spent: { $sum: "$price" } // Calculate the SUM of the price field
    }}
])

*(Notice the syntax: When referencing the value of an existing field inside an aggregation string, you must prefix it with a dollar sign: "$price").*

6. Chaining the Pipeline ($match + $group)

Let's combine them! We want to calculate the total sales revenue generated in the year 2023, grouped by Product Category.
javascript
1234567891011
db.orders.aggregate([
    // Stage 1: Filter. Only allow 2023 orders onto the belt.
    { $match: { year: 2023 } },
    
    // Stage 2: Math. Group the 2023 orders by Category, and sum their totals!
    { $group: {
        _id: "$category",
        annual_revenue: { $sum: "$total" },
        items_sold: { $sum: 1 } // Adding 1 for every document counts the total number of orders!
    }}
])

7. The $project Stage (The Formatter)

The final stage of a pipeline is often $project. It acts exactly like Projection in find(). You use it to rename fields, do string concatenation, or hide the ugly _id field before sending the final report to the frontend dashboard.
javascript
12345678
db.users.aggregate([
    // Stage 1: Re-shape the output document
    { $project: {
        _id: 0, // Hide the ID
        full_name: { $concat: ["$first_name", " ", "$last_name"] }, // Combine strings!
        is_adult: { $gte: ["$age", 18] } // Returns a boolean True/False!
    }}
])

8. Mini Project: E-Commerce Analytics Dashboard

The CEO wants a leaderboard showing the Top 3 highest grossing products, but only for "Electronics".
javascript
12345678910111213141516
db.sales.aggregate([
    // 1. Filter out non-electronics
    { $match: { category: "Electronics" } },
    
    // 2. Group by product name and sum the revenue
    { $group: {
        _id: "$product_name",
        total_revenue: { $sum: "$price" }
    }},
    
    // 3. Sort by total_revenue descending (Highest first)
    { $sort: { total_revenue: -1 } },
    
    // 4. Limit to the Top 3
    { $limit: 3 }
])

9. Common Mistakes

  • Forgetting the $ Prefix in Grouping: If you write { id: "userid" }, MongoDB will literally group everyone under the text string "userid". You MUST write { id: "$userid" } to tell MongoDB to evaluate the *value* of the userid field.
  • Putting $group before $match: If you group and calculate the math on 5 million rows, and THEN filter it, you are wasting massive amounts of server CPU. Always $match first to reduce the pipeline volume as early as possible.

10. Best Practices

  • Indexes work on $match: The $match stage can utilize B-Tree indexes, but ONLY if it is the very first stage in the pipeline!

11. Exercises

  1. 1. What MongoDB method is used to initiate an aggregation pipeline?
  1. 2. Inside a $group stage, what operator do you use to add up the values of a specific numeric field across all grouped documents?

12. MongoDB Challenges

Write an aggregation pipeline on the employees collection. Group them by $department, and calculate the Average ($avg) salary for each department.
javascript
123456
db.employees.aggregate([
    { $group: {
        _id: "$department",
        average_salary: { $avg: "$salary" }
    }}
])

13. MCQ Quiz with Answers

Question 1

In the MongoDB Aggregation Framework, how is data processed?

Question 2

When defining the id field inside a $group stage, why must you prepend the target field name with a dollar sign (e.g., "$category")?

14. Interview Questions

  • Q: Explain the architectural concept of an Aggregation Pipeline. Why is the order of stages (specifically putting $match before $group and $sort) critical for database performance?
  • Q: Compare and contrast the purpose of the $project stage in an aggregation pipeline with the standard Projection argument used in a find() query.

15. FAQs

Q: Can I save the results of an aggregation directly into a new collection? A: Yes! You can append the $out stage as the very last step in your pipeline. {$out: "annual
report"} will take the final data and permanently write it into a brand new collection!

16. Summary

The Aggregation Framework is what separates beginners from Senior Database Engineers. By constructing logical pipelines using $match to filter, $group to crunch math, and $project to format the final JSON, you can build enterprise-grade analytics dashboards directly on the database engine.

17. Next Chapter Recommendation

We know how to group data within a single collection. But what if we need to combine data from TWO different collections? In Chapter 16: Advanced Aggregation Pipelines, we will master the $lookup stage—the MongoDB equivalent of the legendary SQL JOIN.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·