Microservices

When Should we Move to Microservices?

December 28, 2023
11 min read

Likes ...

Comments ...

Table of Contents

Small Monoliths

What Makes a Microservice?
Reliance on the DevOps Team
Saga Pattern

Should We Pick Microservices?

Not One Size
Dynamic Control and Rollout
Segmentation
Too Big
Trade-Offs

Final Word

Sometime ago I wrote about modular Monoliths and the value of modern Monolithic architecture. One of the more interesting discussions that came out of that post (and video) is the inverse discussion: when is it right to still pick Microservices?

Like any design choice, the answer is subjective and depends on many things. But there are still general rules of thumb and global metrics we can use.

Before we get into these problems, we need to understand what it means to have a Microservice architecture. Then we can gauge the benefit and price of having such an architecture.

Small Monoliths

A common misconception is that microservices are simply broken down monoliths. This isn’t the case. I’ve talked to quite a few people who still hold that notion, to be fair they might have a point. This is how AWS defines Microservices:

Microservices are an architectural and organizational approach to software development where software is composed of small independent services that communicate over well-defined APIs. These services are owned by small, self-contained teams.

Microservices architectures make applications easier to scale and faster to develop, enabling innovation and accelerating time-to-market for new features.

Smaller monoliths might fit the definition, but they don’t if you read between the lines. The words “independent”, and “easier to scale” hint at the problem. The problem (and advantage) of a monolith is a single point of failure. By having one service we can usually find problems more easily. The architecture is much simpler.

If we break this service down into smaller pieces, we essentially create distributed points of failure. If one piece along the chain fails, the entire architecture breaks down. That isn’t independent, and it isn’t easier to scale. Microservices are NOT small monoliths and breaking down the Monolith isn’t only about working with smaller projects. It’s about shifting the way we work.

What Makes a Microservice?

A good Microservice needs to follow these principles for robustness and scale:

Divided by business function – this is a logical division. A Microservice is a standalone “product” that provides a complete package. This means that the team responsible for the Microservice can make all the changes required for the business without dependencies.
Automation through CI/CD – without continuous delivery the cost of updating would eliminate every benefit of Microservices.
Independent deployment – is implied, since a commit on one Microservice will only trigger the CD of that specific service. We can accomplish this through Kubernetes and Infrastructure as Code (IaC) solutions.
Encapsulation – it should hide the underlying implementation details. A service acts as a standalone product that publishes an API for other products.
We commonly accomplished this via REST interfaces but also messaging middleware. This is further enhanced with API Gateways.
Decentralized with no single point of failure – otherwise, we would distribute failure.
Failures should be isolated – without this, a single service going down could create a domino effect. Circuit breakers are probably the most important tools for isolating failures. To satisfy this dependency every microservice handles its own data. This means many databases which can be challenging at times.
Observable – this is required to deal with failures on a scale. Without proper observability, we are effectively blind as the various teams can deploy automatically.

This is all good and well, but what does that mean in practical terms?

Most of what it means is that we need to make several big changes to the way we handle some big ideas. We need to move more of the complexity to the DevOps team. We need to handle cross-microservice transactional state differently. This is one of the hardest concepts to grasp when dealing with Microservices.

In an ideal world, all our operations will be simple and contained in a small microservice. The service mesh framework surrounding our microservices will handle all the global complexities and manage our individual services for us. But that isn’t the real world. In reality, our Microservices might have a transactional state that carries between the services. External services might fail and for that, we need to take some unique approaches.

Reliance on the DevOps Team

If your company doesn’t have good DevOps and Platform Engineering teams, Microservices aren’t an option. Instead of deploying one application, we might deploy hundreds because of migration. While the individual deployments are simple and automated, you will still throw a lot of work at operations.

When something doesn’t work or doesn’t connect. When a new service needs to integrate or the service configuration should be adopted. Operations carry a greater burden when working with Microservices. This requires great communication and collaboration. It also means the team managing a specific service needs to take some of the OPS burdens back. That isn’t a simple task.

As developers we need to know many of the tools used to tie our separate services back to a single unified service:

Service Mesh lets us combine separate services and effectively acts as a load balancer between them. It also provides security, authorization, traffic control and much more.
API Gateways should be used instead of invoking the API directly. This can be awkward at times but it's often essential to avoid costs, prevent rate limiting and more.
Feature Flags & Secrets are useful in a monolith as well. But they’re impossible to manage at a Microservice scale without dedicated tools.
Circuit Breaking lets us kill a broken web service connection and recover gracefully. Without this, a single broken service can bring down the entire system.
Identity management must be separate. You can’t get away with an authentication table in the database when dealing with a Microservice environment.

I’ll skip orchestration, CI/CD, etc. but they too need to be adapted for every service that comes out. Some of those tools are opaque to developers but we need the help of DevOps in all the phases.

Saga Pattern

Stateless services would be ideal, carrying a state makes everything far more complex. If we stored the state in the client, we need to send it back and forth all the time. If it is on the server, we would need to either fetch it constantly, cache it or save it locally and then all interaction would be performed against the current system. That eliminates the scalability of the system.

A typical Microservice will store in its own database and work with local data. A service that needs remote information will typically cache some data to avoid round-trips to the other service. This is one of the biggest reasons Microservices can scale. In a Monolith the database should become the bottleneck of the application, which means the Monolith is efficient and limited by the speed we can store and retrieve the data. This has two major drawbacks:

Size – the more data we have the larger the database and performance impacts all users at once. Imagine querying an SQL table of every purchase ever made on Amazon just to find your specific purchase.
Domain – databases have different use cases. Some databases are optimized for consistency, write speed, read speed, time data, spatial data and more. A microservice that tracks user information would probably use a time series database which is optimized for time-related information, whereas a purchase service will focus on a traditional conservative ACID database.

Note that a Monolith can use more than one database. That can work perfectly well and can be very useful. But it’s the exception. Not the rule.

The Saga pattern works by using compensating transactions to undo the effects of a saga if it fails. When a saga fails, the compensating transaction is executed to undo the changes made by the previous transaction. This allows the system to recover from failures and maintain a consistent state. We can accomplish this with tools such as Apache Camel but this is non-trivial and requires far more involvement than a typical transaction in a modern system. That means that for every major cross-service operation you would need to do the equivalent undo operation that will restore the state back. That is non-trivial. There are several tools for saga orchestration but this is a big subject that is beyond the scope of this post, still I will explain it in broad terms.

What’s important to understand about Saga is that it avoids the classic ACID database principles and focuses on “eventual consistency”. That means operations would bring the database to a consistent state at some point. That is a very difficult process. Imagine debugging a problem that only occurs when the system is in an inconsistent state…

The following image demonstrated the idea in broad terms. Let’s say we have a money transfer process.

For the money transfer, we need to first allocate funds.
We then verify that the recipient is valid and exists.
Then we need to deduct the funds from our account.
And finally, we need to add the money to the recipient's account.

That is a successful transaction. With a regular database, this would be one transaction and we can see this in the blue column on the left. But if something goes wrong we need to run the reverse process.

If a failure occurs when allocating funds, we need to remove the allocation. We need to create a separate block of code that does the inverse operation of the allocation.
If verifying a recipient fails we need to remove that recipient. But then we need to also remove the allocation.
If deducting the funds fails we need to restore the funds, remove the recipient and remove the allocation.
Finally if adding the funds to the recipient fails we need to run all the undo operations!

Image description

Another problem in Saga is illustrated in the CAP theorem. CAP stands for Consistency, Availability and Partition Tolerance. The problem is we need to pick any two… Don’t get me wrong, you might have all three. But in a case of a failure you can only guarantee two.

Availability means that requests receive responses. But there’s no guarantee that they contain the most recent writes.

Consistency means that every read receives the most recent write on an error.

Tolerance means that everything will keep working even if many messages get dropped along the way.

Image description

This differs greatly from our historic approach to failure with transactions.

Should We Pick Microservices?

Hopefully, you now understand how hard it is to deploy Microservices properly. We need to make some big compromises. This new way isn’t necessarily better, in some regards, it is worse. But the proponents of Microservices still have a point, we can gain a lot through Microservices and should focus on those benefits too.

We mentioned the first requirement upfront: DevOps. Having a good DevOps team is a prerequisite to considering Microservices. I saw teams trying to hack their way through this without an OPS team and they ended up spending more time on operational complexity than writing code. It wasn’t worth the effort.

The biggest benefit of Microservice is to the team. That is why having a stable team and scope is crucial. Splitting teams into vertical teams that work independently is a tremendous benefit. The most modular monolith in the world can’t compete with that. When we have hundreds of developers following the git commits alone and tracking the code changes in scale becomes untenable. The value of Microservices is only realized in a large team. This sounds reasonable enough, but in a startup environment, things shift suddenly. A colleague of mine works for a startup that employed dozens of developers. They decided to follow a Microservice architecture and built a lot of them… Then came the downsizing and maintaining dozens of services in multiple languages became a problem.

Splitting a Monolith is hard but doable. Unifying Microservices to a Monolith is probably harder, I’m unaware of anyone who seriously tried to do that but would be curious to hear stories.

Not One Size

In order to move to a Microservice architecture we need a bit of a mind shift. A good example is in the databases. A good example would be a user tracking Microservice. In a Monolith, we would write the data to a table and move on with our work. But this is problematic…

As data scales, this user tracking table can end up containing a great deal of data that is hard to analyze in real-time without impacting the rest of the operating system. With a Microservice we can offer several advantages:

The interface to the microservice can use messaging - which means the cost to send tracking information will be minimal.
Tracking data can use a Time Series database which would be more efficient for this use case.
We can stream the data and process it asynchronously to derive additional value from that data.

There are complexities, data will no longer be localized. So if we send tracking data asynchronously we need to send everything necessary as the tracking service won’t be able to go back to the original service to get additional meta-data. But it has a locality advantage, if regulation changes about tracking storage there’s a single place where this is stored.

Dynamic Control and Rollout

Did you ever push a button to a release that broke production?

I did, more than once (way too many times). That’s a terrible feeling. Microservices can still fail in production and can still fail catastrophically, but often their failure is more localized. It is also easier to roll them out to a specific subset of the system (Canary) and verify. These are all policies that can be controlled in depth by the people who actually have their fingers on the user's pulse: OPS.

Observability for Microservices is essential, expensive, but also more powerful. Since everything occurs at the network layer, it is all exposed to the observability tools. An SRE or a DevOps can understand a failure with greater detail. This comes at the expense of the developer who might need to face increased complexity and limited tooling.

Applications can become too big to fail. Even with modularity, some of the largest monoliths around have so much code it takes hours to run through a full CI/CD cycle. Then if the deployment fails reverting to the last good version might also take a while.

Segmentation

Back in the day, we used to divide teams based on layers. Client, Server, DB, etc. This made sense since each of those required a unique set of skills. Today, vertical teams make more sense, but we still have specialties.

Typically, a mobile developer wouldn’t work on the backend. But let’s say we have a mobile team that wants to work with GraphQL instead of REST. With a Monolith we would either tell them to “live with it” or we would have to do the work. With Microservices we can create a simple service for them with very little code. A simple facade to the core services. We won’t need to worry about a mobile team writing server code since this would be relatively isolated. We can do the same for every client layer, this makes it easier to integrate a team vertically.

Too Big

It is hard to put the finger on a size that makes a monolith impractical but here’s what you should ask yourself:

How many teams do we have or want?

If you have a couple of teams, then a monolith is probably great. If you have a dozen teams, then you might face a problem there.

Measure pull request, and issue resolution times.

As a project grows your pull requests will spend more time waiting to merge and issues will take longer to resolve. This is inevitable as complexity tends to grow in the project. Notice that a new project will have larger features and that might sway the results once you account for that in the project stats the decrease in productivity should be measurable.

Notice that this is one metric. In many cases, it can indicate other things such as the need to optimize the test pipeline, the review process, modularity, etc.

Do we have experts who know the code?

At some point, a huge project becomes so big that the experts start losing track of the details. This becomes a problem when bugs become untenable and there’s no authoritative figure that can make a decision without consultation.

Are you comfortable spending money?

Microservices will cost more. There’s no way around that. There are special cases where we can tune scale, but ultimately observability and management costs would remove any potential cost savings. Since personnel costs usually exceed the costs of cloud hosting the total might still play in your favor as those costs might decrease if the scale is big enough.

Trade-Offs

The trade-offs of monolith vs. microservice are illustrated nicely in the following radar chart. Notice that this chart was designed with a large project in mind. The smaller the project, the better the picture is for the Monolith.

Image description

Notice that Microservices deliver a benefit in larger projects in fault tolerance and team independence. But they pay a price in cost. They can reduce R&D spend but they mostly shift it to DevOps so that isn’t a major benefit.

Final Word

The complexity of Microservices is immense and sometimes ignored by the implementing teams. Developers use Microservices as a cudgel to throw away parts of the system they don’t want to maintain, instead of building a sustainable, scalable architecture worthy of replacing a monolith.

I firmly believe that projects should start off with a monolith. Microservices are an optimization for scaling a team and optimizing prematurely is the root of all evil. The question is, when is the right time to do such an optimization?

There are some metrics we can use to make that decision easier. Ultimately, the change isn’t just splitting a monolith. It means rethinking transactions and core concepts. By starting with a monolith we have a blueprint we can use to align our new implementation as it strengthens.

December 28, 2023
11 min read

Likes ...

Comments ...

Microservices

Shai Almog

Author

Author, DevRel, Blogger, Open Source Hacker, Java Rockstar, Conference Speaker, Instructor and Entrepreneur.

Project Panama for Newbies (Part 1)

SpringBoot 3.2 + CRaC

The Java Story: A Film About All of Us

New Between-Quarters Security Updates for Java: What CSPUs Mean for Your Release Pipeline

Toward a Durable Spring PetClinic

First Test of Java on Banana Pi (ARM and RISC-V), Plus a Blinking LED with Pi4J

Creating Scalable OpenAI GPT Applications in Java

Foojay Podcast #92: Java 26 Is Here: What’s New, What’s Gone, and Why It Matters in 2026

Temporal Is to Your Code What a Database Is to Your Data

🤖 5 Best Practices for Working with AI Agents, Subagents, Skills and MCP

foojay: A Place for Friends of OpenJDK

Dashboard for OpenJDK Update Release Details

JDK14: New Features and Enhancements

Fun with Flags: My Top 10 Resources for JVM Flags

Performance of Modern Java on Data-Heavy Workloads: Real-Time Streaming

Performance of Modern Java on Data-Heavy Workloads: Batch Processing

How does Java handle different Images and ColorSpaces – Part 1

How does Java handle different Images and ColorSpaces – Part 2

How does Java handle different Images and ColorSpaces – Part 3

How does Java handle different Images and ColorSpaces – Part 4

Indexing all of Wikipedia, on a laptop

Working with Multiple Carets in IntelliJ IDEA

Clean Shutdown of Spring Boot Applications

Project Panama for Newbies (Part 1)

Java 17 on the Raspberry Pi

How to Create Mobile Apps with JavaFX (Part 1)

Beginning JavaFX Applications with IntelliJ IDE

SpringBoot 3.2 + CRaC

Preparing for Spring Framework 7 and Spring Boot 4

Foojay Slack: bit.ly/join-foojay-slack

Debug Like a Senior Developer

New online course is launching with the first video ready to view. More are coming in the coming months, also the book is ready for preorder!

Dec 16 5,4K

Shai Almog

Tutorials

Videos Tools IntelliJ IDEA

Watch Area and Renderers

Stop digging through variables in the watch to find nuggets of gold, or rerunning the expression evaluation. Use entity renderers instead.

Mar 25 2,2K

Shai Almog

IntelliJ IDEA

Tutorials

What are you Missing by Debugging in VS Code?

16 missing features in the VS code debugger that are available in IntelliJ. Are they worth switching your main IDE? Detailed lists and videos!

Feb 17 4,1K

Shai Almog

VS Code

Tutorials IntelliJ IDEA

Cut Code Review Time & Bugs in Half. Instantly.

Free eBook: Sustainability for Java Developers

Modernizing Java with Jakarta EE 11

When Should we Move to Microservices?