Unique Identifiers Based on Timestamps in Distributed Applications

June 14, 2023
4 min read

Likes ...

Comments ...

Table of Contents

A Nanosecond Timestamp with a Host Identifier
Generating Unique IDs in a Distributed System
Speeding Up Generation with a Saved Host Identifier
Implementation
Using JMH to Benchmark the Timestamp Provider
Disadvantages of This Approach
Conclusion

At Chronicle we build applications that must process very high numbers of events with minimum latency. Generating unique IDs for these events using the traditional method of UUIDs introduces an unacceptable time overhead into our applications, so an alternative approach is needed.

I recently wrote an article on how timestamps can be used as unique identifiers, as they are much cheaper to generate than other methods of generating unique identifiers, taking a fraction of a microsecond.

Guaranteeing uniqueness when using timestamps at nanosecond granularity is straightforward on a single system. However, providing the same guarantee in a distributed application where functionality is spread over multiple systems, each with its own clock, can be much harder.

This article describes an extension of the previously described algorithm that generates unique identifiers in a distributed environment, and presents them in a format that is easy to read.

A Nanosecond Timestamp with a Host Identifier

Chronicle’s DistributedUniqueTimeProvider overwrites the lowest two digits of a generated nanosecond timestamp with the hostId. This makes it easy to identify when looking at the generated identifier.

We can therefore produce identifiers that are guaranteed unique, across up to 100 machines.

For a hostId of 28, the generated timestamp would look like this:

2021-12-28T14:07:02.954100128

The last two digits represent the hostId, and the remainder represents the timestamp on that hostId as date/time/microseconds.

With this change, the resolution of the timestamp is now one-tenth of a microsecond (hundreds of nano-seconds). This apparent loss of precision is not important, since it corresponds to the limit of the available wall clock in many systems.

The hostId can be set as a system property by default on the command line with -DhostId=xx, or programmatically using:

DistributedUniqueTimeProvider.instance().hostId(hostId)

Generating Unique IDs in a Distributed System

Each host participating in the application has a predefined, unique host identifier, or hostId.

Currently, we assume up to 100 hosts.

Of course, Java application components run in a JVM, and there may be multiple JVMs active on the same physical machine. JVMs sharing the same (memory mapped) DistributedUniqueTimeProvider may share a hostId. Alternatively, a JVM may have its own unique hostId assigned to it.

Speeding Up Generation with a Saved Host Identifier

Having a preconfigured host identifier and keeping track of the most recent identifier in shared memory allows fast concurrent generation of timestamp based identifiers across machines. As timestamps only have a resolution of 100 ns, the sustained limit for a single host is ten million per second.

For a maximum number of 100 hosts, the theoretical limit is one billion per second. If you need more than 100 servers, the strategy can be altered to reduce resolution, and allow more concurrent hosts. e.g. 1000 hosts could generate one million ids per second with a microsecond resolution.

Implementation

The happy path is simple. Take the current time, remove the lower two digits and add the hostId. The value of HOST_IDS is hardcoded to 100, as we assume 2 digits for the hostId. As long as the generated identifier is higher than the last one generated, it’s ok.

Should the machine fail and the information as to the last identifier be lost, the assumption is that the time taken to restart the service is enough time to ensure there is no overlap. If the service fails, but not the machine, the information is retained.

NOTE: This used the MappedFile in shared memory supported by Chronicle Bytes, which is open source.

If the time hasn’t progressed, either due to high contention, or the wall clock going backwards e.g. due to a correction, a loop is called to find the next available identifier.

This loop looks for the next possible timestamp (with the hostId) and attempts to update it.

Using JMH to Benchmark the Timestamp Provider

With JMH, benchmarking this utility in a single-threaded manner is pretty easy.

After less than five minutes, we get the following result on a windows laptop. You can get better results on a high-end server or desktop.

The average time is around 37.4 nanoseconds. While this is single-threaded, this is generally on the unhappy path, as timestamps need to be at least 100 ns apart or they temporarily run ahead of the wall clock.

UUID.randomUUID() is also very fast, only six times longer, however, if you need a timestamp and a source identifier for your event anyway, this is additional work/data.

Disadvantages of This Approach

Traditionally, Java applications used UUIDs as unique identifiers. This approach is considerably more efficient than the generation of UUIDs, however there may still be some cases where UUIDs present a more suitable approach:

Support is built-in, and the extra overhead may not be a concern
No configuration is required
UUIDs are not predictable, the timestamp-based ones are highly predictable

Conclusion

This article has described a highly efficient approach to generating an 8-byte lightweight identifier that is unique across many hosts, based on some predetermined partitioning by host identifier.

The identifier is still easily readable as text in a slightly modified form of a timestamp.

June 14, 2023
4 min read

Likes ...

Comments ...

Peter Lawrey

Author

Peter Lawrey is a Java Champion and Oracle Code One alumnus. Peter likes to inspire developers to improve the craftsmanship of their solutions and his popular blog “Vanilla Java” has had over 4 million views. He is the founder and architect of Chronicle Software. He has one of the top number of answers for Java and JVM on StackOverflow.com (~13K).

AWS Nitro and CPU Graviton Meets Unikernels

(Semantic) Versioning your Java libraries

The Java Story: A Film About All of Us

Project Panama for Newbies (Part 1)

A Week of Housekeeping: What Changed on Foojay.io

Getting Started with Deep Learning in Java Using Deep Netts

Nulling Out References Won’t Help Your Garbage Collector

SpringBoot 3.2 + CRaC

🤖 5 Best Practices for Working with AI Agents, Subagents, Skills and MCP

Testing Spring Boot JMS with ActiveMQ Artemis and Testcontainers

foojay: A Place for Friends of OpenJDK

Dashboard for OpenJDK Update Release Details

JDK14: New Features and Enhancements

Fun with Flags: My Top 10 Resources for JVM Flags

Performance of Modern Java on Data-Heavy Workloads: Real-Time Streaming

Performance of Modern Java on Data-Heavy Workloads: Batch Processing

How does Java handle different Images and ColorSpaces – Part 1

How does Java handle different Images and ColorSpaces – Part 2

How does Java handle different Images and ColorSpaces – Part 3

How does Java handle different Images and ColorSpaces – Part 4

Indexing all of Wikipedia, on a laptop

Working with Multiple Carets in IntelliJ IDEA

Clean Shutdown of Spring Boot Applications

Project Panama for Newbies (Part 1)

Java 17 on the Raspberry Pi

How to Create Mobile Apps with JavaFX (Part 1)

Beginning JavaFX Applications with IntelliJ IDE

SpringBoot 3.2 + CRaC

Preparing for Spring Framework 7 and Spring Boot 4

Foojay Slack: bit.ly/join-foojay-slack

“The More You Say, the Less People Remember…

…The Fewer the Words, the Greater the Profit.” And more wisdom and insight from Peter Lawrey, covering a range of development approaches.

Jan 25 4,5K

Peter Lawrey

Agile

Opinion Chronicle Software

Peter Lawrey Talks about Low-Latency, High-Performance Java

About 7 years ago, I attended a session given by Java Champion Peter Lawrey, leader of Chronical Software, at a JavaOne conference. Since most of my prior development work in the realm of low-latency high-performance was C/C++ software, I was very interested in hearing what Peter might say about how Java addresses this problem.

I caught up with Peter again recently, and asked him some questions about what’s happened since then, and where we are today. Here are my questions and Peter’s responses.

Oct 22 6,6K

Kevin Farnham

Interviews

Performance

Automatically Creating Microservices Architecture Diagrams

Upload a JAR, search for YAML, create a DOT, convert this to a PNG, all taking place behind the scenes, with just a click of a button.

Apr 20 14,3K

Jasmine Taylor

Microservices

Cloud Chronicle Software

Challenges when Developing a GUI for FIX

In this article, we explore the challenges in developing a Graphical User Interface (GUI) for Financial Information Exchange (FIX) data.

Dec 21 5,2K

Rob Austin

Java Core

Uncategorized Developer Tools

Chronicle FIX: Designed Not To Skip A Message Even If Your Data Centre Fails

High availability is achieved in Chronicle FIX by failover, where workload is transferred from a primary engine to a secondary engine in the event of a primary engine failure.

Apr 07 5,2K

Forough Goudarzi

JavaFX

Use Cases Jakarta EE

Comments (2)

Armel Hyacinthe

3 years ago

Thank you for this article. I would like to know if this is more efficient than the TSID. Here is the repo.: https://github.com/vladmihalcea/hypersistence-tsid. I expect you to comment.

Hi Armel Hyacinthe, One of the key differences is that the timestamp has a milli-second resolution and isn't human readable by default. It's highly likely to be unique with a random element, but not guaranteed. I will try a benchmark of it for comparison. Thank you for the suggestion.

Cut Code Review Time & Bugs in Half. Instantly.

Standards Over Lock-In: Modernizing Java with Jakarta EE 11 on Azul Payara 7

Free eBook: Sustainability for Java Developers

Unique Identifiers Based on Timestamps in Distributed Applications

A Nanosecond Timestamp with a Host Identifier

Generating Unique IDs in a Distributed System