Apache Cassandra 4.0: Taming Tail Latencies with Java 16 ZGC

June 22, 2021
2395 Unique Views
5 min read

Table of Contents

The garbage collection challengeZGC performance results

Throughput results
Latency results

The NoSQLBench performance testing suiteConclusionAppendix: Test environment

With Apache Cassandra 4.0, you not only get the direct improvements to performance added by the Apache Cassandra committers, you also unlock the ability to take advantage of seven years of improvements in the JVM itself. This article focuses on improvements in Java garbage collection that Cassandra 4.0 coupled with Java 16 offers over Cassandra 3.11 on Java 8.

Like so many others in the Apache Cassandra community, I’m extremely excited to see that the 4.0 release is finally here. There are many, many improvements to Cassandra 4.0. One enhancement that is more important than it might look is the addition of support for Java versions 9 and up. This was not trivial, because Java 9 made changes to some internal APIs that the most performance-oriented Java projects like Cassandra relied on (you can read more about this here).

This is a big deal because with Cassandra 4.0, you not only get the direct improvements to performance added by the Apache Cassandra committers, you also unlock the ability to take advantage of seven years of improvements in the JVM (Java Virtual Machine) itself.

Here, I’d like to focus on improvements in Java garbage collection that Cassandra 4.0 coupled with Java 16 offers over Cassandra 3.11 on Java 8.

The garbage collection challenge

In 2012, I gave a talk titled, “Dealing with JVM Limitations in Apache Cassandra.” Here is the first slide from that presentation:

On the one hand, garbage collection is a primary reason that Java is so much more productive than traditional systems languages like C++. As JVM architect Cliff Click once wrote, “Many concurrent algorithms are very easy to write with a GC and totally hard to downright impossible using explicit free.” Cassandra takes full advantage of this power.

But performing garbage collection means having to briefly pause the JVM to determine which objects are no longer in use and can safely be disposed of. These GC pauses can cause delayed response times to client requests, i.e., increased latencies.

Not all requests are affected by this–only the handful of requests that are in flight while Cassandra’s request-handling threads are paused for the GC. The performance impact is thus only visible in tail latencies, that is, the 99th percentile or 99.9th percentile measurements, corresponding to the slowest 1% or 0.1% of requests.

As with so many things, optimizing GC involves tradeoffs, and the original Java GC designs focused more on improving throughput than on reducing pause times. Fast forward to 2021 and we have common server-class CPUs with 64 cores/128 threads—we have plenty of throughput on tap. It’s time to spend some of those cycles on lower pause times.

The Z Garbage Collector (ZGC) was created to address this situation, and specifically to guarantee pause times under 10ms. ZGC was added to Java 11 as an experimental feature, promoted to production in Java 15, and further improved in Java 16.

To show how well ZGC improves Cassandra performance, we compared both throughput and latency in three environments: Cassandra 3.11 running on JDK 8 with its default CMS GC settings, Cassandra 4.0 running on JDK 8 with the same settings, and Cassandra 4.0 running on JDK 16 with ZGC. I’m pleased to report that ZGC convincingly achieves its design goals, allowing Cassandra to deliver nearly-constant latencies through the 99th percentile, with only a modest uptick at the 99.9th percentile!

ZGC performance results

My colleague Jonathan Shook benchmarked the performance characteristics of Cassandra 3.11 and 4.0 in detail across three workloads: simple key/value, a time series workload with many rows per partition, and a tabular workload with one row per partition but many columns per row.

Throughput results

Here we are looking at Cassandra running at 70% of maximum throughput. This leaves 30% operational headroom to absorb compaction, repair, or load spikes for the purposes of realistic measurements.

Cassandra 4.0 running with the same configuration as Cassandra 3.11 is 30% faster in the key/value workload, 2% slower in the time series workload, and 10% faster in the tabular workload. Turning on ZGC unlocks an additional 30% more throughput for key/value and time series workloads, but has no effect on the tabular workload.

Latency results

I’ve split the latency results into one chart per workload so it’s easier to see the trends across the different percentiles:

For these results, we limited each test scenario to the slowest system’s throughput, i.e., we used 30,000, 44,000, and 54,000 requests per second for the key/value, time series, and tabular workloads, respectively.

Cassandra 4.0’s latencies are virtually identical to 3.11’s with the same GC settings, but ZGC is consistently better, up to a solid factor of 5 to 10 better at p99 and p999 percentiles.

The NoSQLBench performance testing suite

Most benchmarks of non-relational databases are done with either product-specific tooling (like cassandra-stress), or with YCSB, which gives you a lowest-common-denominator key-value workload across dozens of systems.

Jonathan Shook created NoSQLBench to be a cross-platform performance testing tool that is easier to use than cassandra-stress and (much) more powerful than YCSB; in fact, its scripting layer is powerful enough to support things that no other testing tool can enable, with particular emphasis on modeling complex workloads with fidelity, as well as simulating realistic scenarios such as load spikes. As its name suggests, NoSQLBench is not Cassandra-specific and encourages participation from all who want to contribute; today there are clients for Cassandra, CockroachDB, JDBC, and MongoDB, as well as non-database products Kafka and Pulsar. If you’re serious about performance testing in 2021, you should check out NoSQLBench. You can get started at GitHub. Other useful links: releases, discord, docs.

The NoSQLBench workload descriptions for the tests in this post can be found here.

Conclusion

Without switching to ZGC, Cassandra 4.0 offers modest but real throughput improvements for key/value and tabular workloads.

Combining Cassandra 4.0 with ZGC in Java 16 results in further improvements to throughput for key/value and time series workloads as well as convincingly demonstrating ZGC’s design goals to make GC pause time a non-issue across all tested workloads for Cassandra 4.0.

ZGC is production-ready starting with Java 15; for enterprises that want to stick with LTS releases, ZGC will be one of the headlining reasons to upgrade to the Java 17 LTS release later this year. ZGC is one of the most significant performance “free lunches” available, and it Just Works—the results shown here are out-of-the-box for ZGC with no extra tuning.

Appendix: Test environment

All tests were run on the same physical cluster of AWS i3.4xl nodes: 16 vCPUs, 122GB RAM, 10Gb network, 5 nodes in the cluster. Storage was configured as XFS on direct NVMe, single volume. All data was stored at RF3. Assigned tokens were used to ensure consistent data distribution across the tested versions. Consistency level for all operations was set as LOCAL_QUORUM. Concurrency from the client side was set at 960 (20x client cores) for the keyvalue test, and 480 (10x client cores) for the time-series and tabular tests.

All measurements were taken from the client, and include duration between submitting and fully reading any data in results. All measurements were taken with 3 significant digits of precision, then rounded to the nearest ms. ZGC was configured with basic recommended settings: 16GB min heap, 64GB max heap, large pages enabled. The other numbers are using Cassandra’s out-of-the-box configuration with CMS.

Don’t Forget to Share This Post!

Jonathan Ellis

Author

Jonathan is the founder of Brokk (https://brokk.ai). Brokk keeps LLMs on-task in million-line codebases by adding compiler-grade understanding of your code's structure and semantics. Jonathan is also the author of JVector, co-founder of DataStax, and the founding project chair of Apache Cassandra.

Testing an OpenRewrite Recipe

Foojay Podcast #75: JCON Report, Part 4 – Tips and Tricks for Java Devs

Data Modeling for Java Developers: Structuring With PostgreSQL and MongoDB

Creating Scalable OpenAI GPT Applications in Java

Clean and Modular Java: A Hexagonal Architecture Approach

Dissection of Joeffice: Open Source Office Suite in Java

Building a Real-Time AI Fraud Detection System with Spring Kafka and MongoDB

Prime Time: The High Performance Java Event

Project Panama for Newbies (Part 1)

How I Improved Zero-Shot Classification in Deep Java Library (DJL) OSS

foojay: A Place for Friends of OpenJDK

Dashboard for OpenJDK Update Release Details

JDK14: New Features and Enhancements

Fun with Flags: My Top 10 Resources for JVM Flags

Performance of Modern Java on Data-Heavy Workloads: Real-Time Streaming

Performance of Modern Java on Data-Heavy Workloads: Batch Processing

How does Java handle different Images and ColorSpaces – Part 1

How does Java handle different Images and ColorSpaces – Part 2

How does Java handle different Images and ColorSpaces – Part 3

How does Java handle different Images and ColorSpaces – Part 4

Indexing all of Wikipedia, on a laptop

Working with Multiple Carets in IntelliJ IDEA

Clean Shutdown of Spring Boot Applications

Java 17 on the Raspberry Pi

How to Create Mobile Apps with JavaFX (Part 1)

Project Panama for Newbies (Part 1)

Foojay Slack: bit.ly/join-foojay-slack

Beginning JavaFX Applications with IntelliJ IDE

SpringBoot 3.2 + CRaC

Debugging Java on the Command Line

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Learn about a number of experiments that have been conducted with Apache Kafka performance on Azul Platform Prime, compared to vanilla OpenJDK. Roughly 40% improvements in performance, both throughput and latency, are achieved.

Stable, Secure, and Affordable Java

Azul Platform Core is the #1 Oracle Java alternative, offering OpenJDK support for more versions (including Java 6 & 7) and more configurations for the greatest business value and lowest TCO.

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Apache Cassandra 4.0: Taming Tail Latencies with Java 16 ZGC

The garbage collection challenge