How Java Litters Beyond the Heap: Part 3, Solid-State Drives

January 27, 2023
2932 Unique Views
5 min read

Table of Contents

How an SSD Writes DataHow an SSD Updates DataGarbage Collection in SSDsSSD Over-ProvisioningWrapping Up

A Java application dutifully executes your logic, leaving behind footprints in the Java heap in the form of dead objects.

A garbage collector will then step in and clean out the memory for the new data. This cycle repeats until the app is stopped. This is well known.

But, the Java heap is one of many places where your app can generate garbage.

The application can also litter other parts of the software stack. It’s not done deliberately, but because some stack components also take advantage of garbage collection.

For instance, in my previous articles, I discussed how dead records get generated and then collected in relational databases such as PostgreSQL and distributed databases such as YugabyteDB.

In this article, we’ll look at solid-state drives (SSDs), which have become ubiquitous and the default storage medium for on-disk data.

What do Java and SSDs have in common? Garbage collection!

How an SSD Writes Data

Imagine you have created an application that actively uses an SSD. The application can use the SSD directly via the Java File API or indirectly through a database.

The SSD splits its storage space into blocks, then divides each block into pages.

A page is the SSD’s smallest logical unit, consisting of physical memory cells. The page size is usually 4KB. The pages are grouped into blocks. Typically, a block is comprised of 128 pages and has a size of 512KB (128 pages * 4KB page size).

The block is the smallest unit of access in the SSD. Even if the app needs to read a 4-byte integer value from disk, the file system API/driver first gets a 512KB block containing the value and only then returns to the app the requested 4-byte value.

What about writes? Suppose the app needs to write some user data to disk. The app sends an INSERT statement to a database and the latter flushes changes to the disk.

The SSD will receive the write request from the database and will store data to a free page of one of its blocks. The SSD always writes new data to new pages, it never overwrites used ones.

How an SSD Updates Data

How does the device handle updates if the SSD never overwrites used pages?

It’s simple. The SSD writes new data to a free page and marks a page with old data as stale.

Over time, the number of stale pages in the block keeps growing, leaving less and less space for new data.

The stale pages are effectively garbage that needs to be removed. And the SSD has its own garbage collector.

Garbage Collection in SSDs

Before getting into the garbage collection details of SSDs, let's find out why the device's inventors turned to this memory management technique. Why couldn't they just erase or overwrite used pages with stale data whenever new data arrived?

It all has to do with physics. While application data can be written or read at the page level, the stale data can only be erased at the block level. This erasure requires more voltage than required for read and write operations. If you apply that voltage at the page level, the SSD controller can damage data in adjacent pages. Thus, the SSD always erases entire blocks.

As a result, the SSD has its own garbage collector (yes, similar to Java) that traverses blocks and cleans those that are about to run out of free space.

The garbage collection is a two-step process in SSDs:

All live (used) data is moved to another empty block. See block #2 on the picture.
The block with stale data (block #1) gets erased.

As Java developers, we know that the garbage collector needs free space in the heap to do its job efficiently. If the heap space becomes a scarce resource, then the collector can impact the performance of the app and even put it on hold with long stop-the-world pauses. Well, SSDs are similar here as well. If the SSD’s garbage collector runs out of free blocks, be ready to take a performance hit.

SSD Over-Provisioning

SSD manufacturers were aware of the negative impact that garbage collection can have on the performance of our applications. So, they came up with SSD over-provisioning, where each device comes with an extra space that is unavailable to the user.

That over-provisioned space is a safe buffer, allowing your apps and the garbage collector to work with the SSD concurrently, causing as little impact as possible.

However, even though the SSD allocates over-provisioned space, the garbage collector continues using the space belonging to the application data.

As soon as the applications need to persist bigger volumes of data, less space will be available for the garbage collector. If there is a write-intensive workload that generates and updates data on disk continuously, performance can fall sharply:

So, if your application's performance suddenly worsens and your disk I/O chart looks like the one above, tit might be your SSD garbage collector. If the SSD is 50% (or more) full, you may start noticing the impact of garbage collection. In this case, consider using an SSD with larger capacity, and see if you can optimize your write workloads.

Wrapping Up

As you see, even SSDs use garbage collection to their advantage. If you’d like to learn more about SSD garbage collection internals, check out the following articles:

This article concludes my series on how Java litters beyond the heap.

The series aimed to show that garbage collection is a widespread technique used far beyond the Java ecosystem.

If implemented properly, garbage collection can simplify the architecture of software and hardware without performance impact.

Java, PostgreSQL, and SSDs are great examples of products that successfully take advantage of garbage collection and still remain among the top products in their categories.

Also, as a bonus, next time someone asks you to explain the inner workings of Java garbage collection, go ahead and surprise them by expanding on the topic to include databases and SSDs.

Don’t Forget to Share This Post!

Denis Magda

Author

Denis started his professional career at Sun Microsystems and Oracle, where he built JVM/JDK and led one of the Java development groups. After learning Java from the inside, he joined the world of distributed systems and databases, where he has remained ever since.

Foojay Podcast #75: JCON Report, Part 4 – Tips and Tricks for Java Devs

Testing an OpenRewrite Recipe

Data Modeling for Java Developers: Structuring With PostgreSQL and MongoDB

Creating Scalable OpenAI GPT Applications in Java

Clean and Modular Java: A Hexagonal Architecture Approach

Dissection of Joeffice: Open Source Office Suite in Java

Building a Real-Time AI Fraud Detection System with Spring Kafka and MongoDB

Prime Time: The High Performance Java Event

Project Panama for Newbies (Part 1)

How I Improved Zero-Shot Classification in Deep Java Library (DJL) OSS

foojay: A Place for Friends of OpenJDK

Dashboard for OpenJDK Update Release Details

JDK14: New Features and Enhancements

Fun with Flags: My Top 10 Resources for JVM Flags

Performance of Modern Java on Data-Heavy Workloads: Real-Time Streaming

Performance of Modern Java on Data-Heavy Workloads: Batch Processing

How does Java handle different Images and ColorSpaces – Part 1

How does Java handle different Images and ColorSpaces – Part 2

How does Java handle different Images and ColorSpaces – Part 3

How does Java handle different Images and ColorSpaces – Part 4

Indexing all of Wikipedia, on a laptop

Working with Multiple Carets in IntelliJ IDEA

Clean Shutdown of Spring Boot Applications

Java 17 on the Raspberry Pi

How to Create Mobile Apps with JavaFX (Part 1)

Project Panama for Newbies (Part 1)

Foojay Slack: bit.ly/join-foojay-slack

Beginning JavaFX Applications with IntelliJ IDE

SpringBoot 3.2 + CRaC

Debugging Java on the Command Line

Stable, Secure, and Affordable Java

Azul Platform Core is the #1 Oracle Java alternative, offering OpenJDK support for more versions (including Java 6 & 7) and more configurations for the greatest business value and lowest TCO.

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Learn about a number of experiments that have been conducted with Apache Kafka performance on Azul Platform Prime, compared to vanilla OpenJDK. Roughly 40% improvements in performance, both throughput and latency, are achieved.

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

How Java Litters Beyond the Heap: Part 3, Solid-State Drives

How an SSD Writes Data

How an SSD Updates Data

Garbage Collection in SSDs

SSD Over-Provisioning

Wrapping Up

Denis Magda

Denis Magda

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Stable, Secure, and Affordable Java

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Stable, Secure, and Affordable Java

Jakarta EE 11: Beyond the Era of Java EE

Comments (0)

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Do you want your ad here?

How Java Litters Beyond the Heap: Part 3, Solid-State Drives

How an SSD Writes Data

How an SSD Updates Data

Garbage Collection in SSDs

SSD Over-Provisioning

Wrapping Up

Denis Magda

Denis Magda

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Stable, Secure, and Affordable Java

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Stable, Secure, and Affordable Java

Jakarta EE 11: Beyond the Era of Java EE

Do you want your ad here?

Related Articles

Comments (0)

Set Event Reminder

Subscribe to foojay updates:

Share with