Friends of OpenJDK Today

A Tentative Comparison of Fault Tolerance Libraries on the JVM

January 11, 2022

Author(s)

  • Avatar photo
    Nicolas Frankel

    Nicolas is a developer advocate with 15+ years experience consulting for many different customers, in a wide range of contexts (such as telecoms, banking, insurances, large retail and public sector). ... Learn more

If you're implementing microservices or not, chances are that you're calling HTTP endpoints. With HTTP calls, a lot of things can go wrong.

Experienced developers plan for this and design beyond just the happy path. In general, fault tolerance encompasses the following features:

  • Retry
  • Timeout
  • Circuit Breaker
  • Fallback
  • Rate Limiter to avoid server-side 429 responses
  • Bulkhead: Rate Limiter limits the number of calls in a determined timeframe, while Bulkhead limits the number of concurrent calls

A couple of libraries implement these features on the JVM. In this post, we will look at Microprofile Fault Tolerance, Failsafe and Resilience4J.

Microprofile Fault Tolerance

Microprofile Fault Tolerance comes from the Microprofile umbrella project. It differs from the two others because it's a specification, which relies on a runtime to provide its capabilities. For example, Open Liberty is one such runtime. SmallRye Fault Tolerance is another one. In turn, other components such as Quarkus and WildFly embed SmallRye.

Microprofile defines annotations for each feature: @Timeout, @Retry Policy, @Fallback, @Circuit Breaker, and @Bulkhead. It also defines @Asynchronous.

Because the runtime reads annotations, one should carefully read the documentation to understand how they interact if more than one is set.

A @Fallback can be specified and it will be invoked if the TimeoutException is thrown. If @Timeout is used together with @Retry, the TimoutException will trigger the retry. When @Timeout is used with @CircuitBreaker and if a TimeoutException occurs, the failure will contribute towards the circuit open.

-- Timeout Usage

Resilience4J

I came upon Resilience4J when I was running my talk on the Circuit Breaker pattern. The talk included a demo, and it relied on Hystrix. One day, I wanted to update the demo to the latest Hystrix version and noticed that maintainers had deprecated it in favor of Resilience4J.

Resilience4J is based on several core concepts:

  • One JAR per fault tolerance feature, with additional JARs for specific integrations, e.g., Kotlin
  • Static factories
  • Function composition via the Decorator pattern applied to functions
  • Integration with Java's functional interfaces, e.g., Runnable, Callable, Function, etc.
  • Exception propagation: one can use a functional interface that throws, and the library will propagate it across the call pipeline

Here's a simplified class diagram for Retry.

Resilience4J Retry API

Each fault tolerance feature is built around the same template seen above. One can create a pipeline of several features by leveraging function composition, each one calling another one.

Let's analyze a sample:

var retrySupplier = Retry.decorateSupplier(                                  // 1
    Retry.ofDefaults("retry"),                                               // 2
    () -> server.call()                                                      // 1
);
var config = new CircuitBreakerConfig.Builder()                              // 3
        .slowCallDurationThreshold(Duration.ofMillis(200))                   // 4
        .slidingWindowSize(2)                                                // 5
        .minimumNumberOfCalls(2)                                             // 6
        .build();
var breakerSupplier = CircuitBreaker.of("circuit-breaker", config)           // 7
                                    .decorateSupplier(retrySupplier);        // 7
supplier = SupplierUtils.recover(                                            // 8
    breakerSupplier,
    List.of(IllegalStateException.class, CallNotPermittedException.class),   // 9
    e -> "fallback"                                                         // 10
);
  1. Decorate the base server.call() function with Retry: this function is the one to be protected
  2. Use the default configuration
  3. Create a new Circuit Breaker config
  4. Set the threshold above which a call is considered to be slow
  5. Count over a sliding window of 2 calls
  6. Minimum number of calls to decide whether to open the Circuit Breaker
  7. Decorate the retry function with a Circuit Breaker with the above config
  8. Create a fallback value to return when the Circuit Breaker is open
  9. List of exceptions to handle: they won't be propagated. Resilience4J throws a CallNotPermittedException when the circuit is open.
  10. In case any of the configured exceptions are thrown, call this function instead

The order in which functions are composed can be hard to decipher. Hence, the project offers the Decorators class to combine functions using a fluent API. You can find it in the resilience4j-all module. One can rewrite the above code as:

var pipeline = Decorators.ofSupplier(() -> server.call())
    .withRetry(Retry.ofDefaults("retry"))
    .withCircuitBreaker(CircuitBreaker.of("circuit-breaker", config))
    .withFallback(
        List.of(IllegalStateException.class, CallNotPermittedException.class),
        e -> "fallback"
    );

It makes the intent much clearer.

Failsafe

I stumbled upon Failsafe not long ago. Its tenets are similar to Resilience4J: static factories, function composition, and exception propagation.

While Resilience4J fault tolerance feature don't share a class hierarchy, Failsafe provides the concept of Policy:

Failsafe Retry API

I believe the main difference with Resilience4J lies in its pipelining approach. Resilience4J's API requires you first to provide the "base" function and then embed it inside any wrapper function. You cannot reuse the pipeline on top of different base functions. Failsafe allows it via the FailsafeExecutor class.

Failsafe API

Here's how to create a pipeline, i.e., an instance of FailsafeExecutor.
Notice there's no reference to the base call:

var pipeline = Failsafe.with(                            // 1
    Fallback.of("fallback"),                             // 2
    Timeout.ofDuration(Duration.of(2000, MILLIS)),       // 3
    RetryPolicy.ofDefault()                              // 4
);
  1. Define the list of policies applied from the last to the first in order
  2. Fallback value
  3. If the call exceeds 2000ms, throws a TimeoutExceededException
  4. Default retry policy

At this point, it's possible to wrap the call:

pipeline.get(() -> server.call());

Failsafe also provides a fluent API. One can rewrite the above code as:

var pipeline = Failsafe.with(Fallback.of("fallback"))
    .compose(RetryPolicy.ofDefault())
    .compose(Timeout.ofDuration(Duration.of(2000, MILLIS)));

Conclusion

All three libraries provide more or less the same features. If you don't use a CDI-compliant runtime such like regular application server or Quarkus, forget about Microprofile Fault Tolerance.

Failsafe and Resilience4J are both based on function composition and are pretty similar. If you need to define your function pipeline independently of the base call, prefer Failsafe. Otherwise, pick any of them.

As I'm more familiar with Resilience4J, I'll probably use Failsafe in my next project to get more experience with it.

To go further:

Originally published at A Java Geek on January 7th, 2022

Topics:

Related Articles

View All
  • 8 Debugging Tips for IntelliJ IDEA Users You Never Knew Existed

    As developers, we’re all familiar with debuggers. We use debugging tools on a daily basis – they’re an essential part of programming. But let’s be honest. Usually, we only use the breakpoint option. If we’re feeling frisky, we might use a conditional breakpoint.

    But guess what, the IntelliJ IDEA debugger has many powerful and cutting-edge features that are useful for debugging more easily and efficiently.

    Read More
    Avatar photo
    Avatar photo
    September 09, 2021
  • BlockHound: How It Works

    BlockHound will transparently instrument the JVM classes and intercept blocking calls (e.g., IO) if they are performed from threads marked as “non-blocking operations only” (ie. threads implementing Reactor’s NonBlocking marker interface, like those started by Schedulers.parallel()).

    If and when this happens (but remember, this should never happen!), an error will be thrown.

    Read More
    Avatar photo
    June 22, 2021
  • Avoiding NullPointerException

    The terrible NullPointerException (NPE for short) is the most frequent Java exception occurring in production, according to a 2016 study. In this article we’ll explore the main techniques to fight it: the self-validating model and the Optional wrapper.

    You should consider upgrading your entity model to either reject a null via self-validation or present the nullable field via a getter that returns Optional. The effort of changing the getters of the core entities in your app is considerable, but along the way, you may find many dormant NPEs.

    Read More
    December 22, 2020

Author(s)

  • Avatar photo
    Nicolas Frankel

    Nicolas is a developer advocate with 15+ years experience consulting for many different customers, in a wide range of contexts (such as telecoms, banking, insurances, large retail and public sector). ... Learn more

Comments (0)

Your email address will not be published. Required fields are marked *

Highlight your code snippets using [code lang="language name"] shortcode. Just insert your code between opening and closing tag: [code lang="java"] code [/code]. Or specify another language.

Save my name, email, and website in this browser for the next time I comment.

Subscribe to foojay updates:

https://foojay.io/feed/
Copied to the clipboard