Monitoring Event Loops for Blockages

May 15, 2023
4 min read

Likes ...

Comments ...

Table of Contents

Event Loop Monitoring
Best practice
Sample code
Real-world examples
Conclusion

Chronicle’s open source Chronicle Threads library has a little known feature which is one of the first tools I get from my bag if a client reports that they are seeing latency outliers.

The usual way that a developer will measure their system for latency hotspots is to use a profiler, and modern profilers are amazing; there are numerous commercially available, but I generally find myself using Java Flight Recorder, async-profiler, or honest-profiler. These three are engineered to avoid the safepoint bias, and thus give very accurate results.

The problem with profilers however is that they present aggregate information – a profiler will tell you that function x is using 40% of your program’s CPU – thus allowing you to prioritise your engineering efforts on optimising function x, but what they won’t do is tell you that function y (which only takes 5% of your program’s CPU) normally takes 1 microsecond to run, but occasionally takes 10,000, thus causing rare but important latency outliers.

Another problem with profilers is that it is very difficult to convince management that it is safe to run a profiler – even a low overhead profiler – in production, and some performance outliers only happen in production and can’t be reproduced in test systems.

Event Loop Monitoring

Chronicle Threads provides high performance event loop implementations and utility functions to help with threading and concurrency. Event loops are a very useful abstraction when building a low latency system, and event handlers are a simple mechanism to write safe code in a concurrent world. If you use Chronicle’s EventGroup as your event loop then you get automatically enabled Event Loop Monitoring (this has historically also been known as Loop Block Monitoring).

Event Loop Monitoring monitors fast event loop threads and only looks for outliers – it does not measure and record all execution times, like a profiler. This solution works in dev, test and production environments, and adds essentially zero overhead (other than when a slow event handler is detected).

It works by checking to see if event handler latency remains within acceptable bounds. Latency is determined by measuring the time the action() method of the event loop’s event handlers takes to run and whenever the action() method runs beyond an acceptable latency limit, the event loop monitor asks the JVM for a stack trace for the event loop thread, and outputs this to the log.

This can be explained using pseudo code:

Best practice

The recommended way to use the Event Loop Monitor is to configure it initially with a relatively high threshold (default threshold is 100ms), run, examine stack traces and fix problems, decrease the threshold and repeat.

It is expected that in normal operation it will be configured to fire only 10s or 100s of times a day.

The only real downside to this approach is that we make use of Thread#getStackTrace to get the stack trace of the blocked thread, and this method is safepoint-biased i.e. the blocked thread has to come to a safepoint before the stack trace can be taken and this can lead to inaccurate stack traces. A mitigation to this is to sprinkle the code that you are focused on with Jvm.safepoint() calls, which are a no-op unless you set the jvm.safepoint.enabled system property.

Sample code

If, for example you have some code that looks like this:

class MySlowHandler implements EventHandler {
   private final Random random = new Random();

   @Override
   public boolean action() {
       // simulate sometimes slow application logic
       if (random.nextInt(100) < 5)
           Jvm.pause(150);
       return true;
   }
}

EventGroup eg = EventGroup.builder().build();
eg.start();
eg.addHandler(new MySlowHandler());
... just let it run …

Then, once the Event Loop Monitor has started (it delays for a few seconds to allow everything to get started and warm up), you will start to see:

[main/~monitor] INFO net.openhft.chronicle.threads.VanillaEventLoop - core-event-loop thread has blocked for 102.3 ms.
    at java.lang.Thread.sleep(Native Method)
    at net.openhft.chronicle.core.Jvm.pause(Jvm.java:484)
    at ...MySlowHandler.action(EventGroupTest2.java:40)
    at ...

Real-world examples

The below have all happened in the real world.

Non-blocking APIs

Many of Java’s networking APIs are described as “non-blocking” but if you run some code that makes use of the TCP (or UDP) stack, together with the event loop monitor, on an untuned machine, you will see the event loop monitor firing plenty of times and highlighting blocking in the network stack e.g.:

2022-08-16 04:39:57.806 INFO  VanillaEventLoop - tr_sbe_core-event-loop thread has blocked for 4.3 ms.
    at sun.nio.ch.DatagramChannelImpl.receive(DatagramChannelImpl.java:392)
    at sun.nio.ch.DatagramChannelImpl.receive(DatagramChannelImpl.java:345)
    at software.chronicle.enterprise.network.datagram.UdpHandler.action(UdpHandler.java:118)
    at ...

Surprising synchronizeds

Again, in the Java network stack, the event loop monitor can expose some surprising (over)uses of the Java synchronized keyword.

See also this tremendous article by Dmitry that details what we did about it.

Loading up a logger for the first time

On a heavily loaded machine, loading up a new library such as Log4J can be expensive e.g.:

net.openhft.chronicle.threads.MediumEventLoop - event-loop thread has blocked for 102 ms.
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    at org.apache.logging.log4j.util.LoaderUtil.loadClass(LoaderUtil.java:164)
    at org.apache.logging.slf4j.Log4jLogger.createConverter(Log4jLogger.java:416)
    at org.apache.logging.slf4j.Log4jLogger.<init>(Log4jLogger.java:54)
    at org.apache.logging.slf4j.Log4jLoggerFactory.newLogger(Log4jLoggerFactory.java:37)
    at org.apache.logging.slf4j.Log4jLoggerFactory.newLogger(Log4jLoggerFactory.java:29)
    at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:52)
    at org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29)
    at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:355)
    at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:380)
    at com.yourcompany.application.MyClass
    at ...

In this case, the machine was heavily loaded, logging was very infrequent, and the first log caused some classloading to take place, which is an expensive (and blocking) activity.

The lesson here is not to use logging in your fast path.

Conclusion

For lightweight, always-on detection of latency outliers, the Event Loop Monitor is hard to beat.

May 15, 2023
4 min read

Likes ...

Comments ...

Jerry Shea

Author

Jerry Shea has many years experience as a developer, CTO & founder. He has designed and built low latency pricing and trading systems for banks and hedge funds, and has a deep knowledge of Java solution design and implementation for the financial service industry. He has made extensive open source contributions. He is Managing Director for Chronicle Software Asia-Pacific.

Highlights of New JEPs in Java 16

(Semantic) Versioning your Java libraries

Your Loom App Quietly Became a Thread Pool Again: A Field Guide to Virtual Thread Pinning

The Java Story: A Film About All of Us

SpringBoot 3.2 + CRaC

I Asked GitHub Copilot to Profile a Java App. It Found a Bug in My Heap Sizing, and Offered to Fix It

Temporal Is to Your Code What a Database Is to Your Data

🛑⚡ When NOT TO USE Event-Driven Architecture (EDA)

Creating Scalable OpenAI GPT Applications in Java

The Code Was Always the Door

foojay: A Place for Friends of OpenJDK

Dashboard for OpenJDK Update Release Details

JDK14: New Features and Enhancements

Fun with Flags: My Top 10 Resources for JVM Flags

Performance of Modern Java on Data-Heavy Workloads: Real-Time Streaming

Performance of Modern Java on Data-Heavy Workloads: Batch Processing

How does Java handle different Images and ColorSpaces – Part 1

How does Java handle different Images and ColorSpaces – Part 2

How does Java handle different Images and ColorSpaces – Part 3

How does Java handle different Images and ColorSpaces – Part 4

Indexing all of Wikipedia, on a laptop

Working with Multiple Carets in IntelliJ IDEA

Clean Shutdown of Spring Boot Applications

Project Panama for Newbies (Part 1)

Java 17 on the Raspberry Pi

How to Create Mobile Apps with JavaFX (Part 1)

Beginning JavaFX Applications with IntelliJ IDE

SpringBoot 3.2 + CRaC

Preparing for Spring Framework 7 and Spring Boot 4

Foojay Slack: bit.ly/join-foojay-slack

Challenges when Developing a GUI for FIX

In this article, we explore the challenges in developing a Graphical User Interface (GUI) for Financial Information Exchange (FIX) data.

Dec 21 5,2K

Rob Austin

Java Core

Uncategorized Developer Tools

Automatically Creating Microservices Architecture Diagrams

Upload a JAR, search for YAML, create a DOT, convert this to a PNG, all taking place behind the scenes, with just a click of a button.

Apr 20 14,3K

Jasmine Taylor

Microservices

Cloud Chronicle Software

“The More You Say, the Less People Remember…

…The Fewer the Words, the Greater the Profit.” And more wisdom and insight from Peter Lawrey, covering a range of development approaches.

Jan 25 4,6K

Peter Lawrey

Agile

Opinion Chronicle Software

Chronicle FIX: Designed Not To Skip A Message Even If Your Data Centre Fails

High availability is achieved in Chronicle FIX by failover, where workload is transferred from a primary engine to a secondary engine in the event of a primary engine failure.

Apr 07 5,2K

Forough Goudarzi

JavaFX

Use Cases Jakarta EE

Chronicle FIX: Much More Than a Quick Fix

Many of our customers have upgraded from QuickFIX/J to Chronicle FIX and this article provides some background as to why.

Apr 12 5,7K

Jerry Shea

Performance

Tools Developer Tools

Modernizing Java with Jakarta EE 11

Cut Code Review Time & Bugs in Half. Instantly.

Free eBook: Sustainability for Java Developers

Monitoring Event Loops for Blockages

Event Loop Monitoring

Best practice

Sample code

Real-world examples

Non-blocking APIs

Surprising synchronizeds

Loading up a logger for the first time

Conclusion

Jerry Shea

Jerry Shea

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Digma

adesso

Trending

Free eBook: Sustainability for Java Developers

Modernizing Java with Jakarta EE 11

Cut Code Review Time & Bugs in Half. Instantly.

Comments (0)

Modernizing Java with Jakarta EE 11

Cut Code Review Time & Bugs in Half. Instantly.

Free eBook: Sustainability for Java Developers

Do you want your ad here?

Monitoring Event Loops for Blockages

Event Loop Monitoring

Best practice

Sample code

Real-world examples

Non-blocking APIs

Surprising synchronizeds

Loading up a logger for the first time

Conclusion

Jerry Shea

Jerry Shea

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Digma

adesso

Trending

All 0 Likes

Free eBook: Sustainability for Java Developers

Modernizing Java with Jakarta EE 11

Cut Code Review Time & Bugs in Half. Instantly.

Do you want your ad here?

Related Articles

Comments (0)

Set Event Reminder

Subscribe to foojay updates:

Share with