Do you want your ad here?

Contact us to get your ad seen by thousands of users every day!

[email protected]

MongoDB Aggregation Framework: A Beginner’s Guide

  • June 05, 2025
  • 6598 Unique Views
  • 4 min read
Table of Contents
Aggregation pipelineAggregation stagesCombining stagesWrapping up

Finding exactly the data we need isn’t always a simple task.

You’ve probably faced situations where you needed to filter information, group it, and even perform calculations to produce a final result.

And often, delivering this processed data to the client is essential for the application’s success. MongoDB offers two main ways to fetch data:

  • find()
  • aggregate()

While .find() is great for basic queries, it doesn’t cover more advanced scenarios like transformations and complex data processing. That’s where the MongoDB Aggregation Framework comes in.

The MongoDB Aggregation Framework works like a pipeline—a series of stages where each step processes the data in some way. When we use the aggregate() method, we’re building this sequence of operations.

Before diving into MongoDB, here’s a simple example of a pipeline in Java:

List<String> names = Arrays.*asList*("Alice", "Aloisio",  "alice", "andre", "Ricardo", "Jose", "Maria");
var count = names.stream()
      .map(String::toLowerCase)
      .filter(name -> name.startsWith("a"))
      .distinct()
      .count();
System.out.println(count); 

// output = 3




If you look closely, it uses functions like map, filter, distinct, and count.
In other words, it:

  1. Converts each name to lowercase.
  2. Filters names that start with "a".
  3. Removes duplicate names.
  4. Counts the total unique names.

This is the essence of a pipeline: chaining operations that refine data step by step until you get the final result.

In MongoDB, we do something very similar.

Aggregation pipeline

An aggregation pipeline consists of one or more stages. Each stage represents a step that will be executed.

For example, consider a transactions collection where we want to count how many transactions contain errors. We could filter by status and then count:

[  
  {  
    $match: {  
      status: "error"  
    }  
  },  
  {  
    $count: "total errors"  
  }  
]




Here, we apply a $match filter to select only the documents with status equal to "error," and then use $count to calculate the total number of transactions in this status.

Each stage in the pipeline is executed in order, and each one only processes the results from the previous stage. So in the example above, even if there are 1,000 transactions in total, the $count stage only counts the transactions that matched the "error" status from the $match stage.

Aggregation stages

As mentioned earlier, stages are used to build a pipeline. In this section, let’s take a look at some stages that are useful for our day-to-day work.

To explore their capabilities, we’ll create a collection called articles that will contain the following documents:

db.articles.insertMany(  
   [  
       {  
           _id: 1,  
           title: "Spring Data Unlocked: Getting Started With Java and MongoDB",  
           tags: ["Java", "MongoDB", "Spring"],  
           publishedAt: ISODate("2024-11-11T00:00:00Z"),  
           authors: ["Ricardo Mello"],  
           url: "https://www.mongodb.com/developer/products/mongodb/springdata-getting-started-with-java-mongodb/"  
       },  

       {  
           _id: 2,  
           title: "Java Meets Queryable Encryption: Developing a Secure Bank Account Application",  
           tags: ["Java", "Security", "MongoDB"],  
           publishedAt: ISODate("2024-10-08T00:00:00Z"),  
           authors: ["Ricardo Mello"],  
           url: "https://www.mongodb.com/developer/products/atlas/java-queryable-encryption/"  
       },      
       {  
           _id: 3,  
           title: "Beyond Basics: Enhancing Kotlin Ktor API With Vector Search",  
           tags: ["Kotlin", "Vector Search", "MongoDB"],  
           publishedAt: ISODate("2024-09-18T00:00:00Z"),  
           authors: ["Ricardo Mello"],  
           url: "https://www.mongodb.com/developer/products/atlas/beyond-basics-enhancing-kotlin-ktor-api-vector-search/"  
       },  

   ]  
)




$Match

This is one of the most common stages you’ll use. It basically serves to filter documents based on a specific query. For example, if you only want to return the document with _id: 3, you can use:

db.articles.aggregate([  
   { $match: { _id: 3 } }  
])  
// This will return Beyond Basics's article




$Project

We use this stage to specify which fields we’d like to include in our results.

Suppose we want to return all documents and project only the title and author fields.

db.articles.aggregate([  
   { $project: { _id: 0, title: 1, authors: 1 } }  
])




The result would look like this:

{  
   "title": "Beyond Basics: Enhancing Kotlin Ktor API With Vector Search",  
   "authors": ["Ricardo Mello"]  
 },  

 //.. Others..

$Unwind

The $unwind stage is used to deconstruct an array into multiple documents. For example:

db.articles.aggregate([
   { $unwind: "$tags" }
])




For each tag in the `tags` array, the document will be repeated in the query results.
This way, you can analyze or process each tag individually:

{
  "_id": 2,
  "title": "Spring Data Unlocked: Getting Started With Java and MongoDB",
  "tags": "Java",
  // other fields...
 }
 {
  "_id": 2,
  "title": "Spring Data Unlocked: Getting Started With Java and MongoDB",
  "tags": "MongoDB",
  // other fields...
}
 {
  "_id": 2,
  "title": "Spring Data Unlocked: Getting Started With Java and MongoDB",
  "tags": "Spring",
  // other fields...
},
  // other Documents...




$Group

As the name suggests, we use this stage to group our results. This time, we’ll use the `$unwind` stage we saw earlier to deconstruct the array of tags and find out how many articles exist for each tag:

db.articles.aggregate([  
   { $unwind: "$tags" },  
   {  
       $group: {  
           _id: "$tags",  
           totalArticles: { $sum: 1 }  
       }  
   }  
])

The result would look like this:

[  
   {  
       "_id": "Security",  
       "totalArticles": 1  
   },  
   {  
       "_id": "MongoDB",  
       "totalArticles": 4  
   },  
   {  
       "_id": "Java",  
       "totalArticles": 2  
   }  
   .. other tags (Kotlin, Vector Search ..)  
]




$Sort

Continuing with our example—what if we want to query all articles and sort them by publication date, from newest to oldest?

db.articles.aggregate([  
   { $sort: { publishedAt: -1 } }  
])
And if we want to reverse the order—showing the oldest articles first—we just use `1` instead of `-1`.

$AddFields

This stage is useful when we want to add a new field in our result.

Let’s say our client requested that we display a field called `publishedYear` containing only the year:

db.articles.aggregate([  
   {  
     $addFields: {  
       publishedYear: { $year: "$publishedAt" }  
     }  
   }  
 ])
Our result would look something like this:
 "_id": 2,  
 .. other fields  
 "publishedYear": 2024 // FIELD ADDED  
// Other fields ..
Here, you can see that we’re using an operator called $year to extract the year from our publishedAt field. To learn about other operators, check out our official documentation page on aggregation operators.

Combining stages

As we explored earlier, a pipeline can combine multiple stages. Let’s say we want to know the total number of articles published in 2025 and beyond. We can combine the $match and $count stages for this:

db.articles.aggregate(
   [
       {
           $match: {
             publishedAt: { $gt: ISODate("2024-12-31T00:00:00Z") }
           }
       },
       {
           $count: 'total'
       }
   ]
)
Notice that we’re using the $gt operator to filter for years greater than the specific date.

Wrapping up

Aggregation Pipeline is a powerful alternative that MongoDB offers for combining stages and extracting data in an accurate and efficient way. There’s a whole world of stages and operators for you to explore.

Always turn to the MongoDB community for your questions. I hope this article has been useful to you all.

Testing MongoDB Atlas Search Java Apps Using TestContainers

Table of Contents What is MongoDB Atlas Search, anyway?Local development and testing with MongoDB Atlas Search What’s TestContainers? Let’s write some code! Simple CRUD data access and unit tests MongoDB Atlas Search with seed data and index wait Advanced seed …

Understanding BSON: A Beginner’s Guide to MongoDB’s Data Format

Table of Contents What is BSON? Why not just JSON? BSON vs. JSONCommon BSON data types (and their Java equivalents)BSON and MongoDB internalsSetup and project structureBSON data types and document creation Nested fields and arrays Why use nested structures? Raw …

Avoiding NullPointerException

The terrible NullPointerException (NPE for short) is the most frequent Java exception occurring in production, according to a 2016 study. In this article we’ll explore the main techniques to fight it: the self-validating model and the Optional wrapper.

You should consider upgrading your entity model to either reject a null via self-validation or present the nullable field via a getter that returns Optional. The effort of changing the getters of the core entities in your app is considerable, but along the way, you may find many dormant NPEs.

Busting Myths, Building Futures: A Conversation with Cay Horstmann on Java and Machine Learning

Cay Horstmann shares his experiences with Java, his writing process for technical books, the challenges of teaching Java, and discusses its role in education.

OmniFish logo
10 Best Practises For Jakarta EE Performance Optimization

Table of Contents Quick ComparisonSecrets of Performance Tuning Java on Kubernetes by Bruno BorgesNext Steps With this article, we start a series where we compiled 10 best practices for performance optimizations and suggestions how to implement them using Jakarta EE & Eclipse GlassFish. Enjoy …

Do you want your ad here?

Contact us to get your ad seen by thousands of users every day!

[email protected]

Comments (0)

Highlight your code snippets using [code lang="language name"] shortcode. Just insert your code between opening and closing tag: [code lang="java"] code [/code]. Or specify another language.

No comments yet. Be the first.

Subscribe to foojay updates:

https://foojay.io/feed/
Copied to the clipboard