Superfast Application Startup: Java on CRaC

May 30, 2022
1354 Unique Views
4 min read

Table of Contents

Introducing the CRaC (Coordinated Restore at Checkpoint) ProjectImplementing the java.crac APIGenerating a Checkpoint with CRaCContinuing Development of the CRaC Project

It’s now twenty-seven years since Java was first released, and it continues to be one of the most popular platforms for applications, especially on servers.

One of the reasons for this is the Java Virtual Machine (JVM). This provides a managed runtime environment that removes the need for developers to deal with things like memory management; the garbage collector takes care of this for you. Another significant advantage of the JVM is the use of bytecodes that can be converted to native instructions at runtime using a Just-In-Tim (JIT) compiler.

When an application starts running, the JVM looks for methods that are hot spots (hence the name of the OpenJDK JVM) and compiles them to get better performance than interpreting the bytecodes. This takes place in two phases. Initially, the C1 compiler is used, which is a fast compiler without extensive optimization. The C2 compiler is subsequently used for very hot methods, which uses profiling data collected from the running application to optimize as much as possible. Techniques like aggressive method inlining and speculative optimizations can easily lead to better performing code than generated ahead of time (AOT) using a static compiler.

This is all great, but it has the downside of the JVM needing both time and compute resources to determine which methods to compile and compiling them. This is what we refer to as the warmup time of an application. The fact that this same work has to happen every time we run an application makes the JVM less attractive in certain situations like microservices and serverless computing.

Ideally, we would like to run the application and then store all the state about the compiled methods and even the compiled code. This is what Azul’s ReadyNow! technology does for our Prime JVM.

What about the state associated with the application itself? Often applications take time to load data and initialize required structures. Wouldn’t it be wonderful if we had a way to save all this as well?

Introducing the CRaC (Coordinated Restore at Checkpoint) Project

This is what the OpenJDK project, Coordinated Restore at Checkpoint (CRaC), was started to investigate. Let’s look at what this is and how it works.

Since 2012, the Linux operating system has had Checkpoint/Restore in Userspace (CRIU). This allows a running application to be paused and restarted at some point later in time, potentially on a different machine. The overall goal of the project is to support the migration of containers. When performing a checkpoint, essentially, the full context of the process is saved: program counter, registers, stacks, memory-mapped and shared memory and so on. To restore the application, all this data can be reloaded and (theoretically) it continues from the same point. However, there are some challenges, not least of which are open files, network connections and a sudden change in the value of the system clock.

Since the JVM is just a running application, we could use CRIU and pause and restart the application running on it. However, when we started this project, we felt that to be usable, we should make the Java code aware that it was about to be checkpointed and that it had been restarted.

We designed a straightforward API and imposed some restrictions on an application’s state when it is checkpointed. The restrictions are quite logical: the application must have no open file descriptors or network connections. This dramatically improves the ability to reliably restart an application from a given checkpoint.

Implementing the java.crac API

To use the API, you must identify any classes in your code that are considered resources. These are classes that need to be notified when a checkpoint is about to be made and when a restore has happened. We provide an eponymous interface, Resource, which you implement for the identified classes. There are only two methods, beforeCheckpoint() and afterRestore(). These are used as callbacks by the JVM. If you have a class reading data from a file, you can close the file in the beforeCheckpoint() method (potentially also generating a checksum). In your afterRestore() method, you can open the file again (using the checksum to determine if the file has changed since the checkpoint was made and take further appropriate action). The same applies to network connections. You can also use these methods to deal with a sudden change in the system clock, which might impact things like cache timeouts.

All Resources in your application must be registered with the JVM. This is achieved by obtaining a CRaC Context and using the register() method. Although you can create your own Context, the simplest way is to use the global Context obtained via the Core class’s static getGlobalContext() method. One other detail is that the order you register your Resources will be the order the beforeCheckpoint methods will be called. However, the afterRestore methods will be called in the opposite order. This simplifies things if there is a particular sequence in which things need to be prepared for a checkpoint; when restoring, you have a predictable inverse sequence. That’s really all there is to the co-ordinated part of this.

Generating a Checkpoint with CRaC

There are two ways to generate a checkpoint. The first is from outside the JVM and uses jcmd with the JDK.checkpoint command. This will initiate the checkpoint and store all the required files in the $HOME/crac-files directory. Most of these files will be very small, only a few Kb in size. However, be aware that if you have a well-filled large heap, you will get some big files. To restore from a checkpoint, use the command:

java -XX:CRaCRestoreFrom=$HOME/crac-files/

The second way to create a checkpoint is programmatically. Add a call to Core.checkpointRestore() where you want this to happen. The method will return when the restore has been completed.

As part of this project, we have created a proof-of-concept build of JDK 17, which can be accessed here. The results from this are very promising. We tested a sample Spring Boot application and in the test environment, this took roughly four seconds before processing the first operation. Using a checkpoint of the running, warmed up application, we restored it and were able to get to the first operation in 40ms. That’s two orders of magnitude faster!

Continuing Development of the CRaC Project

We welcome feedback on this project and encourage others to participate in its development through the OpenJDK. There are still some details to refine, so it’s not yet production-ready. Until we find a way of making this work on other platforms like Windows and macOS, it is unlikely to become part of the mainstream JDK. We’ve deliberately made the API agnostic to the JVM implementation to ensure that other systems for creating a checkpoint and restoring are easy to integrate.

If you want a superfast startup for your Java applications without warmup time or resources, why not try Java on CRaC?

Don’t Forget to Share This Post!

Simon Ritter

Author

Simon is Deputy CTO at Azul.

Foojay Podcast #75: JCON Report, Part 4 – Tips and Tricks for Java Devs

Testing an OpenRewrite Recipe

Data Modeling for Java Developers: Structuring With PostgreSQL and MongoDB

Creating Scalable OpenAI GPT Applications in Java

Clean and Modular Java: A Hexagonal Architecture Approach

Dissection of Joeffice: Open Source Office Suite in Java

Building a Real-Time AI Fraud Detection System with Spring Kafka and MongoDB

Prime Time: The High Performance Java Event

Project Panama for Newbies (Part 1)

How I Improved Zero-Shot Classification in Deep Java Library (DJL) OSS

foojay: A Place for Friends of OpenJDK

Dashboard for OpenJDK Update Release Details

JDK14: New Features and Enhancements

Fun with Flags: My Top 10 Resources for JVM Flags

Performance of Modern Java on Data-Heavy Workloads: Real-Time Streaming

Performance of Modern Java on Data-Heavy Workloads: Batch Processing

How does Java handle different Images and ColorSpaces – Part 1

How does Java handle different Images and ColorSpaces – Part 2

How does Java handle different Images and ColorSpaces – Part 3

How does Java handle different Images and ColorSpaces – Part 4

Indexing all of Wikipedia, on a laptop

Working with Multiple Carets in IntelliJ IDEA

Clean Shutdown of Spring Boot Applications

Java 17 on the Raspberry Pi

How to Create Mobile Apps with JavaFX (Part 1)

Project Panama for Newbies (Part 1)

Foojay Slack: bit.ly/join-foojay-slack

Beginning JavaFX Applications with IntelliJ IDE

SpringBoot 3.2 + CRaC

Debugging Java on the Command Line

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Learn about a number of experiments that have been conducted with Apache Kafka performance on Azul Platform Prime, compared to vanilla OpenJDK. Roughly 40% improvements in performance, both throughput and latency, are achieved.

Stable, Secure, and Affordable Java

Azul Platform Core is the #1 Oracle Java alternative, offering OpenJDK support for more versions (including Java 6 & 7) and more configurations for the greatest business value and lowest TCO.

Jakarta EE 11: Beyond the Era of Java EE

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Stable, Secure, and Affordable Java

Superfast Application Startup: Java on CRaC

Introducing the CRaC (Coordinated Restore at Checkpoint) Project

Implementing the java.crac API

Generating a Checkpoint with CRaC

Continuing Development of the CRaC Project

Simon Ritter

Simon Ritter

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Stable, Secure, and Affordable Java

Jakarta EE 11: Beyond the Era of Java EE

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Comments (0)

Jakarta EE 11: Beyond the Era of Java EE

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Stable, Secure, and Affordable Java

Do you want your ad here?

Superfast Application Startup: Java on CRaC

Introducing the CRaC (Coordinated Restore at Checkpoint) Project

Implementing the java.crac API

Generating a Checkpoint with CRaC

Continuing Development of the CRaC Project

Simon Ritter

Simon Ritter

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Stable, Secure, and Affordable Java

Jakarta EE 11: Beyond the Era of Java EE

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Do you want your ad here?

Comments (0)

Set Event Reminder

Subscribe to foojay updates:

Share with