MicroStream – Part 4: Serialisation Engine

June 29, 2022
3448 Unique Views
4 min read

Table of Contents

Java SerialisationMicroStream SerialisationAdvanced FeaturesUsing the MicroStream SerialisationConclusion

Resources

In this fourth part we go deeper into the Serialisation engine that is within MicroStream to store the Object graph in a binary format.

In the previous articles (part 1, part 2 and part 3), we have already mentioned that MicroStream stores Java instances in storage in a binary way with a new, from the ground up created, serialisation framework.

In this article, we go a bit more in detail about the next generation Java serialisation that we have built to achieve the MicroStream Java Object database and how you can use it outside the functionality of storing the root object that makes up your database.

Java Serialisation

Serialisation is integrated within the JVM and Java since the early days. By adding the Serializable interface to the class definition, an instance can be passed to an ObjectOutputStream and it will be converted to some binary format.

That same byte sequence can be converted back to an instance by an ObjectInputStream. You can customise the entire process by implementing the readObject() and writeObject() methods.

But many of us have learned to avoid this Java functionality as it became rather quickly clear that the entire construction allowed for malicious operations.

When the JVM deserialises the bytes, it performs more than just creating the instances and populating data. It also executes constructors and other methods, even before the control is handed back to the user statements.

This allowed for the so-called gadget chains to be discovered that a certain combination of Java classes that are available in a typical application can lead to the execution of malicious code. They only need to create some specific deserialised content to have the malicious code to be executed.

Mark Reinhold, currently the chief architect of the Java Platform, called the current Java serialisation "a horrible mistake" since many security vulnerabilities are related to this functionality.

With Java 17 and the Context-specific deserialisation filters option, the security vulnerabilities are fixed but it still is not an easy-to-use piece of functionality with some limitations.

MicroStream Serialisation

Using the standard Java serialisation functionality was not an option as it must be possible to serialise any Java class or instance, MicroStream created a new serialisation engine from the ground up.

This allows us to handle any class without the need for any interface or annotation and thus also classes from any dependency of your project. It also allowed us to incorporate a Type Mapping functionality so it can handle changes in your class as your data model evolves over time.

For each Java instance, we look at the instance variables and store only data together with identifications for variables names and class names. When deserialising, we create instances and populate the instance variables again but don't

We use the low-level Java API for that to create instances without actually calling constructors and setting instance variable values directly. By just handling the data, we make sure that no code is executed during deserialisation which makes it safe. Even when the Type Dictionary is compromised, which holds the mapping between the ids used in the binary representation and the actual class and instance variable names, unexpected classes might be created but since no code is executed, this does not harm your environment in any way. And once such an instance is accessed by your code there will be fatal exceptions as the class is not as expected.

Advanced Features

But the engine can do more than just store and read java instances. Two additional features make it suitable for using the JVM memory as your database.

First, it allows for the lazily loading of the data. It can just restore the instances from the storage only when you access them. It allows you to have a very large dataset that would never fit into the memory of your process. The engine initially just creates a very small reference so that at the time you access a Lazy object, it can be loaded from the storage if needed. The Lazy option is discussed already in detail in part 3.

The second functionality is the ability to transform the data when it is loaded. Your data model will evolve over time. Some small changes like a change in the variable name or even a very large refactoring of classes. With the Type Mapping feature of the engine, this can be handled so that your data is not lost and converted automatically when loaded.

The engine can even detect some small changes automatically, like a change in name or an additional variable. For more complex changes, you define the Type Mapping where you indicate the old and the new structure and how the conversion needs to be performed.

Using the MicroStream Serialisation

The MicroStream serialisation is part of the entire framework and is used to store the Java instances to the storage.

But you can access the serialisation also outside the standard usage and access it directly to create a byte array from some object(s). Similar to what is possible with the standard Java serialisation logic.

To do that, you need the following artefact that exposes the required methods.

<dependency>
    <groupId>one.microstream</groupId>
    <artifactId>microstream-persistence-binary</artifactId>
    <version>${microstream.version}</version>
</dependency>

Suppose you have an Employee class where you model the company structure and hierarchy. The following snippets create a serialised and convert the objects, even with the circular reference, to a byte array.

SerializerFoundation<?> foundation = SerializerFoundation.New()
        .registerEntityTypes(Employee.class);
try (Serializer<byte[]> serializer = Serializer.Bytes(foundation)) {

    byte[] data = serializer.serialize(theBoss);
} catch (Exception e) {
    throw new RuntimeException(e);
}

And you can deserialise the bytes to create the Object instances again.

Conclusion

To have a generic serialisation solution that can persist any Java instance, without any restrictions like implementing an interface or requiring annotations that define the mapping, a new algorithm was implemented.
From the ground up, and different from the standard Java persistence to avoid the security vulnerabilities that are associated with that approach.

The serialisation only stores the data and not the class structure. Within the persistence solution, a system called TypeDictionary keeps the information of the structure separately.

But during the deserialisation, no constructor or method is called, only data is restored so that the process is secure and does not allow for gadget chains to be developed.

Resources

Don’t Forget to Share This Post!

Rudy De Busscher

Author

Rudy loves to create (web) applications with the Jakarta EE platform and MicroProfile implementations. Currently, he is a Developer Advocate for MicroStream. He has implemented various projects in a team for customers, helped various Open Source projects (Payara, MicroProfile, PrimeFaces, DeltaSpike, Apache Myfaces, ...), and supported Developers and teams. He is also working around Web Application Security using OAuth2, OpenId Connect, and JWT.

Foojay Podcast #75: JCON Report, Part 4 – Tips and Tricks for Java Devs

Testing an OpenRewrite Recipe

Data Modeling for Java Developers: Structuring With PostgreSQL and MongoDB

Creating Scalable OpenAI GPT Applications in Java

Clean and Modular Java: A Hexagonal Architecture Approach

Dissection of Joeffice: Open Source Office Suite in Java

Building a Real-Time AI Fraud Detection System with Spring Kafka and MongoDB

Prime Time: The High Performance Java Event

Project Panama for Newbies (Part 1)

How I Improved Zero-Shot Classification in Deep Java Library (DJL) OSS

foojay: A Place for Friends of OpenJDK

Dashboard for OpenJDK Update Release Details

JDK14: New Features and Enhancements

Fun with Flags: My Top 10 Resources for JVM Flags

Performance of Modern Java on Data-Heavy Workloads: Real-Time Streaming

Performance of Modern Java on Data-Heavy Workloads: Batch Processing

How does Java handle different Images and ColorSpaces – Part 1

How does Java handle different Images and ColorSpaces – Part 2

How does Java handle different Images and ColorSpaces – Part 3

How does Java handle different Images and ColorSpaces – Part 4

Indexing all of Wikipedia, on a laptop

Working with Multiple Carets in IntelliJ IDEA

Clean Shutdown of Spring Boot Applications

Java 17 on the Raspberry Pi

How to Create Mobile Apps with JavaFX (Part 1)

Project Panama for Newbies (Part 1)

Foojay Slack: bit.ly/join-foojay-slack

Beginning JavaFX Applications with IntelliJ IDE

SpringBoot 3.2 + CRaC

Debugging Java on the Command Line

Stable, Secure, and Affordable Java

Azul Platform Core is the #1 Oracle Java alternative, offering OpenJDK support for more versions (including Java 6 & 7) and more configurations for the greatest business value and lowest TCO.

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Learn about a number of experiments that have been conducted with Apache Kafka performance on Azul Platform Prime, compared to vanilla OpenJDK. Roughly 40% improvements in performance, both throughput and latency, are achieved.

Stable, Secure, and Affordable Java

Jakarta EE 11: Beyond the Era of Java EE

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

MicroStream – Part 4: Serialisation Engine

Java Serialisation

MicroStream Serialisation

Advanced Features

Using the MicroStream Serialisation

Conclusion

Resources

Rudy De Busscher

Rudy De Busscher

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Stable, Secure, and Affordable Java

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Comments (1)

Data Model Evolution with Legacy Type Mapping – Microstream Blog

Stable, Secure, and Affordable Java

Jakarta EE 11: Beyond the Era of Java EE

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Do you want your ad here?

MicroStream – Part 4: Serialisation Engine

Java Serialisation

MicroStream Serialisation

Advanced Features

Using the MicroStream Serialisation

Conclusion

Resources

Rudy De Busscher

Rudy De Busscher

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Stable, Secure, and Affordable Java

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Do you want your ad here?

Related Articles

Comments (1)

Data Model Evolution with Legacy Type Mapping – Microstream Blog

Set Event Reminder

Subscribe to foojay updates:

Share with