The Serverless Database You Really Want

March 03, 2022
5 min read

Likes ...

Comments ...

Table of Contents

The Oracle That Foretold the FutureDefining CommodityOpen Source: Now More Important Than Ever

The dreaded part of every site reliability engineer’s (SRE) job eventually: capacity planning. You know, the dance between all the stakeholders when deploying your applications. Did engineering really simulate the right load and do we understand how the application scales? Did product managers accurately estimate the amount of usage? Did we make architectural decisions that will keep us from meeting our SLA goals? And then the question that everyone will have to answer eventually: how much is this going to cost? This forces SREs to assume the roles of engineer, accountant and fortune teller.

The large cloud providers understood this a long time ago and so the term “cloud economics” was coined. Essentially this means: rent everything and only pay for what you need. I would say this message worked because we all love some cloud. It’s not a fad either. SREs can eliminate a lot of the downside when the initial infrastructure capacity discussion was maybe a little off. Being wrong is no longer devastating. Just add more of what you need and in the best cases, the services scale themselves — giving everyone a nice night’s sleep. All this without provisioning a server, which gave rise to the term “serverless.”

As serverless methodologies have burned through the application tiers, databases have turned out to be the last big thing to feel the heat of progress. No surprise though. Stateful workloads — as in information I really want to keep — is a much harder problem to solve than stateless workloads. The cloud providers have all released their own version of a serverless database, provided you agree to be locked into their walled garden. Open source has always served as the antidote for the dreaded lock-in, and there are really exciting things happening in the Apache Cassandra community in that regard.

The Oracle That Foretold the Future

In the early days of distributed databases, a groundbreaking paper changed everything: the Dynamo paper from Amazon, published in 2007. In it, a team of researchers and engineers described how an ideal system would be built to maximize performance and data consistency while balancing scale and operations. To quote the paper: “A highly available key-value storage system that some of Amazon’s core services use to provide an ‘always-on’ experience.” It served as the basis for several database implementations, including what would become Apache Cassandra.

Dynamo assumed the availability of cheap, commodity hardware in the coming cloud era. As our industries have slowly morphed into building cloud native applications, the definition of commodity hardware has changed. Instead of units being bare-metal or virtual machines, we consume individual scale components of network, compute and storage. Building a serverless Cassandra database continues the work of the Dynamo paper inside this new paradigm; and with it, new scaling and deployment options that fit our cloud native world.

Defining Commodity

In 2007 when the paper was first published, the definition of a commodity was much different than today. Most server-class systems were bulky and incredibly complex to provide the compute power needed and uptime required. ”Commodity” entailed very inexpensive, small servers with the most basic CPU, disk and memory. The first time I deployed Cassandra in my infrastructure, I was able to use the commodity servers to scale out and in the process save a lot of money to achieve better results.

Then along came the cloud and even more changes in definitions. Commodity was now an instance type we could provision and pay for by the hour. This fueled a massive expansion of scale applications and the rise of cloud native but CPU, disk and memory all still had to be considered, especially in stateful workloads like a database. So, the dreaded capacity planning discussion was still happening in deployment meetings. Thankfully, the impact of making a wrong decision was much less when using cloud infrastructure, especially with Cassandra. Need more? Just add more instances to your cluster. Goodbye capacity wall, hello scale.

Now we are at a time when Kubernetes is advancing the pointer of what we can do with cloud native applications. With it, we’ve seen yet another shift in commodity definitions. The classic deployable server or instance type has been decomposed into compute, network and storage. Kubernetes has created a way for us to define and deploy an entire virtual data center, with the parts we need to support the applications we are deploying. Containers allow for precise control over the compute needed (and when).

Software-defined networks do all the complicated virtual wiring in our data centers dynamically. All of which creates an environment that is elastic, scalable, and self-healing. We also get the added benefit of fine-grained cost controls. Goodbye over-provisioning, hello cloud economics.

Open Source: Now More Important Than Ever

Just like the majority of data infrastructure innovations in the past 10 years, the breadth and depth of the needed changes can only be addressed by an engaged community of users. The revolution in serverless databases will happen in open source. Clouds moved fast on early serverless implementations, but as we in open source know: to go far, we go together. The cloud economics of using a vendor-specific serverless database works great, right up until it doesn’t. Free as in freedom means you should be able to use it anywhere. In a cloud, in your own datacenter, or even on your laptop.

One aspect that has driven the popularity of Kubernetes is the undeniable benefit of cloud portability and freedom. Overlay your deployment of a virtual datacenter against any provider of commodity compute, network and storage. Don’t like where you are? Take your data center somewhere else. Don’t like renting the services in a cloud? Run them yourself in Kubernetes. The near future will be about creating new cloud data services in Kubernetes and the communities we form around this exciting part of modern data applications.

The Dynamo pedigree of Apache Cassandra and years of proven reliability in the biggest workloads put it in a strong position for the next revolution of serverless databases. At DataStax, we are the company that just loves open source Cassandra; we have seen this future direction of databases unfolding and we’re excited to participate. We have also been building our own deep experience of running large-scale database cloud deployments in Kubernetes, via DataStax Astra. As a result, our engineering teams have created some of the beginning work for a serverless Cassandra. We will be refining and building knowledge about how to take advantage of the new cloud native commodity definitions and passing on the lower costs of cloud economics.

Expect to see our ideas and code in a GitHub repository soon and discussions opening about what we have learned. Already the Cassandra community is talking about what will happen after 4.0 and it’s safe to say that a serverless Cassandra is top of the list. Inclusion in the open source project K8ssandra, combined with the Stargate project, will further expand the freedom of deployment options and usage.

Data on Kubernetes depends on true cloud economics and scale, which takes us back to our SREs. In the near future when they are thinking about capacity planning, I would love to give them the option of having one less stressful meeting.

Learn more about how DataStax Astra is enabling faster application development and streamlined operations with serverless databases.

March 03, 2022
5 min read

Likes ...

Comments ...

Patrick McFadin

Author

Apache Cassandra and Developer Relations at DataStax

Project Panama for Newbies (Part 1)

SpringBoot 3.2 + CRaC

The Java Story: A Film About All of Us

New Between-Quarters Security Updates for Java: What CSPUs Mean for Your Release Pipeline

Toward a Durable Spring PetClinic

First Test of Java on Banana Pi (ARM and RISC-V), Plus a Blinking LED with Pi4J

Creating Scalable OpenAI GPT Applications in Java

Foojay Podcast #92: Java 26 Is Here: What’s New, What’s Gone, and Why It Matters in 2026

Temporal Is to Your Code What a Database Is to Your Data

🤖 5 Best Practices for Working with AI Agents, Subagents, Skills and MCP

foojay: A Place for Friends of OpenJDK

Dashboard for OpenJDK Update Release Details

JDK14: New Features and Enhancements

Fun with Flags: My Top 10 Resources for JVM Flags

Performance of Modern Java on Data-Heavy Workloads: Real-Time Streaming

Performance of Modern Java on Data-Heavy Workloads: Batch Processing

How does Java handle different Images and ColorSpaces – Part 1

How does Java handle different Images and ColorSpaces – Part 2

How does Java handle different Images and ColorSpaces – Part 3

How does Java handle different Images and ColorSpaces – Part 4

Indexing all of Wikipedia, on a laptop

Working with Multiple Carets in IntelliJ IDEA

Clean Shutdown of Spring Boot Applications

Project Panama for Newbies (Part 1)

Java 17 on the Raspberry Pi

How to Create Mobile Apps with JavaFX (Part 1)

Beginning JavaFX Applications with IntelliJ IDE

SpringBoot 3.2 + CRaC

Preparing for Spring Framework 7 and Spring Boot 4

Foojay Slack: bit.ly/join-foojay-slack

New Survey Finds Data on Kubernetes Is No Longer a Pipe Dream

For people that work in infrastructure and application development, the pace of change is quick. Finish one project and it’s on to the next. Each iteration requires an evaluation asking if the right technology is being used and if it …

Dec 30 2,0K

Patrick McFadin

Apache Cassandra

Microservices Kubernetes DevOps DataStax Databases

Kubernetes and Apache Cassandra: What Works (and What Doesn’t)

Cassandra and K8s are seen as the “most logical pairing”, let’s take a look at how they are a dream team — and why this isn’t always true.

Jan 14 3,7K

Patrick McFadin

Apache Cassandra

Microservices Kubernetes DevOps DataStax Databases

The future of cloud-native databases begins with Apache Cassandra 4.0

Table of Contents The original cloud-native database The prima donna comes to Kubernetes Cassandra is ready for what’s next Ecosystem as a first-class “Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the …

Dec 29 2,5K

Patrick McFadin

DevOps

Kubernetes DataStax Databases Cloud Apache Cassandra

Cut Code Review Time & Bugs in Half. Instantly.

Free eBook: Sustainability for Java Developers

Modernizing Java with Jakarta EE 11

The Serverless Database You Really Want

The Oracle That Foretold the Future

Defining Commodity

Open Source: Now More Important Than Ever

Patrick McFadin

Patrick McFadin

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Digma

adesso

Trending

Cut Code Review Time & Bugs in Half. Instantly.

Free eBook: Sustainability for Java Developers

Modernizing Java with Jakarta EE 11

Comments (0)

Cut Code Review Time & Bugs in Half. Instantly.

Free eBook: Sustainability for Java Developers

Modernizing Java with Jakarta EE 11

Do you want your ad here?

The Serverless Database You Really Want

The Oracle That Foretold the Future

Defining Commodity

Open Source: Now More Important Than Ever

Patrick McFadin

Patrick McFadin

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Digma

adesso

Trending

All 0 Likes

Cut Code Review Time & Bugs in Half. Instantly.

Free eBook: Sustainability for Java Developers

Modernizing Java with Jakarta EE 11

Do you want your ad here?

Related Articles

Comments (0)

Set Event Reminder

Subscribe to foojay updates:

Share with