MongoDB Aggregation Framework: A Beginner’s Guide

June 05, 2025
2528 Unique Views
4 min read

Table of Contents

$Match
$Project
$Unwind
$Group
$Sort
$AddFields

Finding exactly the data we need isn’t always a simple task.

You’ve probably faced situations where you needed to filter information, group it, and even perform calculations to produce a final result.

And often, delivering this processed data to the client is essential for the application’s success. MongoDB offers two main ways to fetch data:

find()
aggregate()

While .find() is great for basic queries, it doesn’t cover more advanced scenarios like transformations and complex data processing. That’s where the MongoDB Aggregation Framework comes in.

The MongoDB Aggregation Framework works like a pipeline—a series of stages where each step processes the data in some way. When we use the aggregate() method, we’re building this sequence of operations.

Before diving into MongoDB, here’s a simple example of a pipeline in Java:

List<String> names = Arrays.*asList*("Alice", "Aloisio",  "alice", "andre", "Ricardo", "Jose", "Maria");
var count = names.stream()
      .map(String::toLowerCase)
      .filter(name -> name.startsWith("a"))
      .distinct()
      .count();
System.out.println(count); 

// output = 3

If you look closely, it uses functions like map, filter, distinct, and count.
In other words, it:

Converts each name to lowercase.
Filters names that start with "a".
Removes duplicate names.
Counts the total unique names.

This is the essence of a pipeline: chaining operations that refine data step by step until you get the final result.

In MongoDB, we do something very similar.

Aggregation pipeline

An aggregation pipeline consists of one or more stages. Each stage represents a step that will be executed.

For example, consider a transactions collection where we want to count how many transactions contain errors. We could filter by status and then count:

[  
  {  
    $match: {  
      status: "error"  
    }  
  },  
  {  
    $count: "total errors"  
  }  
]

Here, we apply a $match filter to select only the documents with status equal to "error," and then use $count to calculate the total number of transactions in this status.

Each stage in the pipeline is executed in order, and each one only processes the results from the previous stage. So in the example above, even if there are 1,000 transactions in total, the $count stage only counts the transactions that matched the "error" status from the $match stage.

Aggregation stages

As mentioned earlier, stages are used to build a pipeline. In this section, let’s take a look at some stages that are useful for our day-to-day work.

To explore their capabilities, we’ll create a collection called articles that will contain the following documents:

db.articles.insertMany(  
   [  
       {  
           _id: 1,  
           title: "Spring Data Unlocked: Getting Started With Java and MongoDB",  
           tags: ["Java", "MongoDB", "Spring"],  
           publishedAt: ISODate("2024-11-11T00:00:00Z"),  
           authors: ["Ricardo Mello"],  
           url: "https://www.mongodb.com/developer/products/mongodb/springdata-getting-started-with-java-mongodb/"  
       },  

       {  
           _id: 2,  
           title: "Java Meets Queryable Encryption: Developing a Secure Bank Account Application",  
           tags: ["Java", "Security", "MongoDB"],  
           publishedAt: ISODate("2024-10-08T00:00:00Z"),  
           authors: ["Ricardo Mello"],  
           url: "https://www.mongodb.com/developer/products/atlas/java-queryable-encryption/"  
       },      
       {  
           _id: 3,  
           title: "Beyond Basics: Enhancing Kotlin Ktor API With Vector Search",  
           tags: ["Kotlin", "Vector Search", "MongoDB"],  
           publishedAt: ISODate("2024-09-18T00:00:00Z"),  
           authors: ["Ricardo Mello"],  
           url: "https://www.mongodb.com/developer/products/atlas/beyond-basics-enhancing-kotlin-ktor-api-vector-search/"  
       },  

   ]  
)

$Match

This is one of the most common stages you’ll use. It basically serves to filter documents based on a specific query. For example, if you only want to return the document with _id: 3, you can use:

db.articles.aggregate([  
   { $match: { _id: 3 } }  
])  
// This will return Beyond Basics's article

$Project

We use this stage to specify which fields we’d like to include in our results.

Suppose we want to return all documents and project only the title and author fields.

db.articles.aggregate([  
   { $project: { _id: 0, title: 1, authors: 1 } }  
])

The result would look like this:

{  
   "title": "Beyond Basics: Enhancing Kotlin Ktor API With Vector Search",  
   "authors": ["Ricardo Mello"]  
 },  

 //.. Others..

$Unwind

The $unwind stage is used to deconstruct an array into multiple documents. For example:

db.articles.aggregate([
   { $unwind: "$tags" }
])

For each tag in the `tags` array, the document will be repeated in the query results.
This way, you can analyze or process each tag individually:

 {
  "_id": 2,
  "title": "Spring Data Unlocked: Getting Started With Java and MongoDB",
  "tags": "Java",
  // other fields...
 }
 {
  "_id": 2,
  "title": "Spring Data Unlocked: Getting Started With Java and MongoDB",
  "tags": "MongoDB",
  // other fields...
}
 {
  "_id": 2,
  "title": "Spring Data Unlocked: Getting Started With Java and MongoDB",
  "tags": "Spring",
  // other fields...
},
  // other Documents...

$Group

As the name suggests, we use this stage to group our results. This time, we’ll use the `$unwind` stage we saw earlier to deconstruct the array of tags and find out how many articles exist for each tag:

db.articles.aggregate([  
   { $unwind: "$tags" },  
   {  
       $group: {  
           _id: "$tags",  
           totalArticles: { $sum: 1 }  
       }  
   }  
])

The result would look like this:

[  
   {  
       "_id": "Security",  
       "totalArticles": 1  
   },  
   {  
       "_id": "MongoDB",  
       "totalArticles": 4  
   },  
   {  
       "_id": "Java",  
       "totalArticles": 2  
   }  
   .. other tags (Kotlin, Vector Search ..)  
]

$Sort

Continuing with our example—what if we want to query all articles and sort them by publication date, from newest to oldest?

db.articles.aggregate([  
   { $sort: { publishedAt: -1 } }  
])

And if we want to reverse the order—showing the oldest articles first—we just use `1` instead of `-1`.

$AddFields

This stage is useful when we want to add a new field in our result.

Let’s say our client requested that we display a field called `publishedYear` containing only the year:

db.articles.aggregate([  
   {  
     $addFields: {  
       publishedYear: { $year: "$publishedAt" }  
     }  
   }  
 ])

Our result would look something like this:

 "_id": 2,  
 .. other fields  
 "publishedYear": 2024 // FIELD ADDED  
// Other fields ..

Here, you can see that we’re using an operator called $year to extract the year from our publishedAt field. To learn about other operators, check out our official documentation page on aggregation operators.

Combining stages

As we explored earlier, a pipeline can combine multiple stages. Let’s say we want to know the total number of articles published in 2025 and beyond. We can combine the $match and $count stages for this:

db.articles.aggregate(
   [
       {
           $match: {
             publishedAt: { $gt: ISODate("2024-12-31T00:00:00Z") }
           }
       },
       {
           $count: 'total'
       }
   ]
)

Notice that we’re using the $gt operator to filter for years greater than the specific date.

Wrapping up

Aggregation Pipeline is a powerful alternative that MongoDB offers for combining stages and extracting data in an accurate and efficient way. There’s a whole world of stages and operators for you to explore.

Always turn to the MongoDB community for your questions. I hope this article has been useful to you all.

Don’t Forget to Share This Post!

Ricardo Mello

Author

Senior Developer Advocate at MongoDB | Java | Kotlin | Spring

Creating Scalable OpenAI GPT Applications in Java

Building a Real-Time AI Fraud Detection System with Spring Kafka and MongoDB

🚀 Document Your Spring Boot APIs with Redocusaurus in Minutes 🦕

Project Panama for Newbies (Part 1)

Clean and Modular Java: A Hexagonal Architecture Approach

The Great Data Reimagination: From Static to Agile in the AI Era

Dissection of Joeffice: Open Source Office Suite in Java

Testing an OpenRewrite Recipe

Foojay Podcast #75: JCON Report, Part 4 – Tips and Tricks for Java Devs

Sustainability Starts with Your Runtime: Meet a Green JVM

foojay: A Place for Friends of OpenJDK

Dashboard for OpenJDK Update Release Details

JDK14: New Features and Enhancements

Fun with Flags: My Top 10 Resources for JVM Flags

Performance of Modern Java on Data-Heavy Workloads: Real-Time Streaming

Performance of Modern Java on Data-Heavy Workloads: Batch Processing

How does Java handle different Images and ColorSpaces – Part 1

How does Java handle different Images and ColorSpaces – Part 2

How does Java handle different Images and ColorSpaces – Part 3

How does Java handle different Images and ColorSpaces – Part 4

Indexing all of Wikipedia, on a laptop

Working with Multiple Carets in IntelliJ IDEA

Clean Shutdown of Spring Boot Applications

Java 17 on the Raspberry Pi

How to Create Mobile Apps with JavaFX (Part 1)

Project Panama for Newbies (Part 1)

Foojay Slack: bit.ly/join-foojay-slack

Beginning JavaFX Applications with IntelliJ IDE

SpringBoot 3.2 + CRaC

Debugging Java on the Command Line

Stable, Secure, and Affordable Java

Azul Platform Core is the #1 Oracle Java alternative, offering OpenJDK support for more versions (including Java 6 & 7) and more configurations for the greatest business value and lowest TCO.

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Learn about a number of experiments that have been conducted with Apache Kafka performance on Azul Platform Prime, compared to vanilla OpenJDK. Roughly 40% improvements in performance, both throughput and latency, are achieved.

Stable, Secure, and Affordable Java

Jakarta EE 11: Beyond the Era of Java EE

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

MongoDB Aggregation Framework: A Beginner’s Guide

Aggregation pipeline

Aggregation stages

$Match

$Project

$Unwind

$Group

$Sort

$AddFields

Combining stages

Wrapping up

Ricardo Mello

Ricardo Mello

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Stable, Secure, and Affordable Java

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Jakarta EE 11: Beyond the Era of Java EE

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Comments (0)

Stable, Secure, and Affordable Java

Jakarta EE 11: Beyond the Era of Java EE

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Do you want your ad here?

MongoDB Aggregation Framework: A Beginner’s Guide

Aggregation pipeline

Aggregation stages

$Match

$Project

$Unwind

$Group

$Sort

$AddFields

Combining stages

Wrapping up

Ricardo Mello

Ricardo Mello

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Stable, Secure, and Affordable Java

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Jakarta EE 11: Beyond the Era of Java EE

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Do you want your ad here?

Related Articles

Comments (0)

Set Event Reminder

Subscribe to foojay updates:

Share with