How to Put a Database in Kubernetes?

January 13, 2022

Unique Views: 3,243

sinceJanuary, 2022

Author(s)

Jeff Carpenter
Developer Relations @ DataStax

Learn the key steps of deploying databases and stateful workloads in Kubernetes and meet the cloud-native technologies, like K8ssandra, that can streamline Apache Cassandra for K8s.

The idea of running a stateful workload in Kubernetes (K8s) can be intimidating, especially if you haven’t done it before. How do you deploy a database? Where is the actual storage? How is the storage mapped to the database or the application using it?

Let's demystify the deployment of databases and stateful workloads in K8s. Basically, it boils down to a few key steps:

Get to know the Kubernetes primitives
Pick a database
Pick a storage provider
Pick an operator

This article dives into the key steps of deploying databases and stateful workloads in K8s. You can learn more about them in the upcoming O’Reilly book: Managing Cloud Native Data on Kubernetes.

1. Get to Know the Kubernetes Primitives

Simply put: databases are just applications composed of compute, network, and storage. We can deploy them like any other K8s application and take advantage of resources that it provides: StatefulSets, Services, StorageClasses, PersistentVolumes, and PersistentVolumeClaims, and more.

Figure 1: Kubernetes resources help us think of applications in terms of compute, network, and storage.

Getting comfortable with using these primitives will help you understand how databases and other data infrastructure are deployed on K8s. For example, a deployment of Apache Cassandra® will typically use a StatefulSet to launch pods across available Kubernetes worker nodes, with each Cassandra pod having its own PersistentVolumeClaim that can be preserved and reused if the pod needs to be replaced.

Figure 2: Simple deployment of Cassandra on Kubernetes using a StatefulSet.

For more great examples of using these primitives online, check the reference example in the Kubernetes documentation of deploying Cassandra using StatefulSets. We’re also building a collection of examples on GitHub in association with the book project and would love to see your issues and pull requests.

Once you’ve familiarized yourself with the basic building blocks of Kubernetes, there are three main considerations when setting up the right database for your application.

2. Pick a Database

To start, you’ll want to think about what kind of database your application needs. To help you make the right choice, consider the following factors:

Database language: does your application need SQL, NoSQL, developer-friendly data APIs?
Capacity, performance, and scalability requirements: will your data fit on a single node, or will you need a distributed database that can scale as your application grows?
Deployment topology: will your application be running in on-premises data centers, public clouds, or a mix of both?

Deciding on a database isn’t entirely independent from other decisions in your application design, and we’ll see more of this below. Note that your needs may also change as your application evolves.

3. Pick a Storage Provider

Unless the database you choose is just a cache holding ephemeral data, you’ll need to configure your database to use persistent storage. If you’re using one of the public clouds, you’ll have storage options available such as Elastic Block Storage (EBS) volumes in AWS.

However, there are many other options that are cloud-vendor independent. You can find a thriving ecosystem of K8s providers in the Cloud-Native Storage category of the CNCF Landscape.

Figure 3: Cloud Native Storage projects on the CNCF Landscape as of September 2021.

These include a number of options for managing both local and networked storage, in formats such as block, file, and object storage. You’ll likely be able to find sample code that shows how to configure your selected database to use your chosen storage provider. For example, here’s a tutorial on running Apache Cassandra on OpenEBS, a popular open-source storage provider for K8s that you can run in a variety of environments.

4. Pick an Operator

If you intend on running more than a small handful of nodes of your selected database, you’ll benefit from automating your operations by using a K8s Operator. You can find a wide variety of operators for databases and other applications at the OperatorHub. When selecting an operator, you’ll want to make sure it’s open-source, and also check how actively it’s maintained.

There are operators for most popular databases, such as the Zalando Postgres-operator, or Cass-operator, which the Apache Cassandra community has recently banded around. Cass-operator is actually part of a larger project called K8ssandra, which builds on that operator to create a more comprehensive data platform around Cassandra. This includes tooling for maintenance and backups, along with an open-source data gateway called Stargate that supports a variety of developer-friendly APIs.

5. An Alternate Approach: Pick a Managed Service

Of course, even with an operator, running a database in K8s yourself may be more than you want to take on, especially if you’re a smaller team looking to maximize your leverage.

If this is you, you can still take advantage of one of the many managed database services available. If you need a highly scalable database combined with a great developer experience, DataStax Astra DB is a great choice. Astra DB is a managed Cassandra service that itself happens to be built on top of Kubernetes, and the Stargate APIs are available by default — even with a free Astra DB account.

6. Meet a Community of Cloud-Native Data Practitioners

No matter what choices you end up making for your K8s-deployed applications, you can find a group of passionate developers pushing the state of the art forward in the Data on Kubernetes Community (DoKC). If you’re attending KubeCon North America, join us for DoK Day on Tuesday, October 12.

Register here to join KubeCon North America 2021 and subscribe to our event alert to get notified about new DataStax workshops for developers, by developers. For exclusive posts on Cassandra, streaming, Kubernetes, and more; follow DataStax on Medium.

Resources

Don’t Forget to Share This Post!

Author Jeff Carpenter

Developer Relations @ DataStax

View All

A Case for Databases on Kubernetes from a Former Skeptic

Looking back at the pitfalls of running databases on Kubernetes I encountered several years ago, most of them have been resolved.

All of these problems are hard and require technical finesse and careful thinking. Without choosing the right pieces, we’ll end up resigning both databases and Kubernetes to niche roles in our infrastructure, as well as the innovative engineers who have invested so much effort in building out all of these pieces and runbooks.
Read More
Christopher Bradford

July 13, 2021
Apache Cassandra
Databases
DataStax
DevOps
Kubernetes
Creating a Kubernetes Operator in Java

Kubernetes is much more than a runtime platform for Docker containers.

Through its API, you can not only create custom clients, but you can also extend Kubernetes. Those custom Controllers are called Operators and work with application-specific custom resource definitions. You can not only write those Kubernetes operators in Go, but you can do this also in Java.

In this talk, delivered by Payara’s Rudy De Busscher at JCON 2020, you will be guided through setting up and your first explorations of the Kubernetes API within a plain Java program.
Read More
Jadon Ortlepp

January 27, 2021
Kubernetes
Videos
Different Approaches to Building Stateful Microservices in the Cloud Native World

How can systems be able to manage and track the flow of data in a coherent fashion and in a stateless world?
Read More
Mary Grygleski

January 04, 2022
Cloud
Kubernetes
Microservices

Author(s)

Jeff Carpenter
Developer Relations @ DataStax

Comments (0)

Cancel reply

Your email address will not be published. Required fields are marked *

Highlight your code snippets using [code lang="language name"] shortcode. Just insert your code between opening and closing tag: [code lang="java"] code [/code]. Or specify another language.

Save my name, email, and website in this browser for the next time I comment.

How to Put a Database in Kubernetes?

Author(s)

1. Get to Know the Kubernetes Primitives

2. Pick a Database

3. Pick a Storage Provider

4. Pick an Operator

5. An Alternate Approach: Pick a Managed Service

6. Meet a Community of Cloud-Native Data Practitioners

Resources

Related Articles

Author(s)

Comments (0)

Cancel reply

Set Event Reminder

Subscribe to foojay updates: