General Build Distribution: A Game-Changer or a Gimmick?

July 20, 2022
1672 Unique Views
6 min read

Table of Contents

Test Execution is the Bottleneck
Frequency of Incremental Builds vs. Full Rebuilds

Parallelization FactorThe Path ForwardSummaryFeedback

Understand the performance potential of remote and distributed builds and explore how to improve build feedback times.

The Remote and Distributed Build Patterns article explains the differences between remote and distributed builds and variations on each. Specifically, we distinguished between "test distribution" and "general build distribution".

This article discusses distributed builds in a broader perspective of improving build feedback times. We'll start by explaining the types of changes engineers tend to make, identify the typical bottlenecks and share how these relate to distributed builds. We will also study the performance potential of general build distribution. Finally, we will explore a holistic approach to improving build feedback times.

In greater detail below, we will elaborate on these three findings:

Building in a distributed fashion is not a substitute for a well-tuned build process.
Improving incremental build performance, not "full rebuilds", is the most important aspect of improving the local developer experience.
General build distribution of a well-tuned build beyond test distribution is an evolutionary, not revolutionary, process that yields marginal performance benefits for most JVM projects.

The analysis and findings presented here apply especially to projects for the JVM ecosystem. Future follow-up articles will address the Android and native/iOS ecosystems.

Most Important Scenarios to Optimize

Two keys to improving the local developer experience lie in understanding the typical bottlenecks faced by engineers, as well as the types of changes built by engineers as they add new features, fix bugs and write tests.

Test Execution is the Bottleneck

Test execution is frequently the single most time-consuming portion of build time. Optimizing builds to avoid unnecessary test execution can yield large productivity gains. The Gradle Build Tool already skips tests when no meaningful changes are detected on the classpath, and can also restore test execution results from the build cache. The post Stop rerunning your tests does a great job of explaining the efficiencies available by minimizing test re-execution.

In the context of distributed builds, this bottleneck is addressed by modern test distribution such as the Test Distribution solution provided by Gradle Enterprise.

Frequency of Incremental Builds vs. Full Rebuilds

The next key point is that in the vast majority of cases engineers are building small, incremental changes. We posit that these small, incremental changes are unlikely to benefit from distributing the build steps beyond test distribution. Also, developers using modern build systems seldom perform a "full rebuild" without the benefit of a shared build cache or retained history of a previous build on the same machine.

Consider a change to the body of a private method of a Java class: only that class must be recompiled and the library containing it is reassembled. But there is no reason to recompile a downstream consumer of that library as it cannot link to a private method. At the opposite end of the spectrum, consider a modification to the public API of a "common" library consumed by many other subprojects in a multiproject build. This will cause a "domino effect" by causing its downstream consumers to be recompiled. General build distribution may help in this scenario, but we contend this is the exception, not the norm (see Parallelization factor below for more insights).

Additionally, Java compilation is relatively fast compared to "native" languages, further reducing the optimization potential of general build distribution in Java projects.

Thus, we encourage skepticism at claims of building large projects "from scratch" as a true measure of build system performance, or as a justification for implementing a remote or distributed build.

Parallelization Factor

The key to understanding the maximum speed potential of any build (locally built, hosted remotely or distributed) is to visualize the interdependencies of its outputs. Imagine a relatively small software project having three subprojects: A, B and C. If compiling subproject C requires the outputs of subprojects A and B, then C depends on A and B. Most importantly, we cannot begin building C until both A and B are complete; therefore the best-case build-time scenario can be denoted as max(A, B) + C. Given a local or remote build host with unlimited CPU cores, or a pool of unlimited distributed build agents, the build cannot be parallelized further than this bottleneck.

Figure 1.: Project structure

As we see that this bottleneck is dependency-based and not performance-based, we now have the ability to predict the potential benefit of remote or distributed builds.

To put this theory to the test, we've performed some analysis of the parallelization factor to establish a theoretically achievable minimal build time given the bottlenecks described above. We examined the build of Gradle itself and other sizable builds in collaboration with some of our partners. We've found these interesting results:

Test execution consumes the vast majority of build time, accounting for 80-90% of the end-to-end CI cycle.
After test execution, the most time-consuming tasks are CPU-intensive tasks like compilation or validation, followed by disk-bound packaging/assembly tasks.
More than half of the non-test tasks are executed in a single process.

This last point is critical: tasks that operate as a single process - with no other processes executing simultaneously - are indicative of a bottleneck, like subproject C in the above example. Single-process tasks are proof that further optimization via distribution is not possible. It is possible that a more powerful remote CPU would finish the compilation task more quickly, but this benefit could easily be negated by the overhead of sending bits back and forth.

Figure 2.: Cumulative work time, grouped by number of concurrent workers. Half of the work was executed with no other busy processes running in parallel.

Setting aside test execution (addressed by Test Distribution, see above), and focusing on the remaining CPU-intensive 10-20% portion of build times, we find that the potential for optimization is low. The failure of half these tasks to execute in parallel with other processes means that, at best, a general distribution solution could expedite only 5-10% of overall build time, while incurring significant costs in terms of build complexity and management overhead.

The Path Forward

As we discussed above, most of the changes done by developers are small, incremental changes and the biggest bottleneck is typically test execution. Therefore, focusing the build optimizations on those aspects will typically yield the best results. The following section lists some of the key steps your build process can implement today. These fundamentals of build performance optimization will not just improve any build whether local, remote or distributed, but will also ensure the best possible performance when potentially moving to a remote or distributed environment in the future.

In this order, we recommend taking advantage of these Gradle Build Tool features to optimize local build feedback time. Most of these features are documented in further detail at Improving the Performance of Gradle Builds:

Incremental Build
Compile Avoidance and Incremental Compilation
Remote Build Cache
Parallel Execution
Configuration Cache (also increases local parallelism)

Additionally, the following features in Gradle Enterprise drastically shorten test feedback time which is usually by far the biggest bottleneck in build performance:

While general build distribution may demonstrate impressive build performance gains when measured in isolation, we've demonstrated that for most JVM projects it's unlikely to offer significant additional build performance improvements for typical scenarios in well-optimized builds. This is not to suggest that we find a general distribution solution uninteresting. Rather, we view this on our long-term roadmap as an evolutionary, not revolutionary, solution.

Summary

Anecdotal evidence and industry experience have shown two things: first, engineers are most likely to iterate and rebuild small, incremental changes - not rebuilding the entire project from scratch. Second, regardless of the type of change being built, test execution is the primary cause of build slowness and reduced developer productivity.

Using active process counts as a proxy for the potential parallelization of local builds, we've shown that general build distribution solutions would have a relatively small - if any - impact on build performance for many builds in the JVM ecosystem.

Running all aspects of a JVM build in a purely distributed fashion is not a panacea. Existing Gradle Build Tool features like incremental task execution, compilation avoidance, incremental compilation, build cache and configuration cache are available today and greatly reduce build times, especially for the most frequent incremental changes. Additionally, commercial features in Gradle Enterprise like Test Distribution and Predictive Test Selection drastically reduce test execution time which is the primary bottleneck for most builds.

Feedback

Let us know if you have any questions on our forums or Gradle Community Slack.

Don’t Forget to Share This Post!

Kyle Moore

Author

Kyle Moore is an experienced software engineer with a demonstrated history of working in the computer software industry at a large scale. At LinkedIn, Kyle was responsible for common functionality for builds of 15,000+ microservices and libraries. Kyle is a contributor to several open source projects including the Gosu language, Gradle build tool, Spotbugs and Cucumber. Kyle recently joined the Gradle build tool team to lead the future in best practice software development.

Testing an OpenRewrite Recipe

Foojay Podcast #75: JCON Report, Part 4 – Tips and Tricks for Java Devs

Data Modeling for Java Developers: Structuring With PostgreSQL and MongoDB

Creating Scalable OpenAI GPT Applications in Java

Clean and Modular Java: A Hexagonal Architecture Approach

Building a Real-Time AI Fraud Detection System with Spring Kafka and MongoDB

Dissection of Joeffice: Open Source Office Suite in Java

Prime Time: The High Performance Java Event

Project Panama for Newbies (Part 1)

How I Improved Zero-Shot Classification in Deep Java Library (DJL) OSS

foojay: A Place for Friends of OpenJDK

Dashboard for OpenJDK Update Release Details

JDK14: New Features and Enhancements

Fun with Flags: My Top 10 Resources for JVM Flags

Performance of Modern Java on Data-Heavy Workloads: Real-Time Streaming

Performance of Modern Java on Data-Heavy Workloads: Batch Processing

How does Java handle different Images and ColorSpaces – Part 1

How does Java handle different Images and ColorSpaces – Part 2

How does Java handle different Images and ColorSpaces – Part 3

How does Java handle different Images and ColorSpaces – Part 4

Indexing all of Wikipedia, on a laptop

Working with Multiple Carets in IntelliJ IDEA

Clean Shutdown of Spring Boot Applications

Java 17 on the Raspberry Pi

How to Create Mobile Apps with JavaFX (Part 1)

Project Panama for Newbies (Part 1)

Foojay Slack: bit.ly/join-foojay-slack

Beginning JavaFX Applications with IntelliJ IDE

SpringBoot 3.2 + CRaC

Debugging Java on the Command Line

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Learn about a number of experiments that have been conducted with Apache Kafka performance on Azul Platform Prime, compared to vanilla OpenJDK. Roughly 40% improvements in performance, both throughput and latency, are achieved.

Stable, Secure, and Affordable Java

Azul Platform Core is the #1 Oracle Java alternative, offering OpenJDK support for more versions (including Java 6 & 7) and more configurations for the greatest business value and lowest TCO.

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Stable, Secure, and Affordable Java

Jakarta EE 11: Beyond the Era of Java EE

General Build Distribution: A Game-Changer or a Gimmick?

Most Important Scenarios to Optimize

Test Execution is the Bottleneck

Frequency of Incremental Builds vs. Full Rebuilds

Parallelization Factor

The Path Forward

Summary

Feedback

Kyle Moore

Kyle Moore

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Stable, Secure, and Affordable Java

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Comments (0)

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Stable, Secure, and Affordable Java

Jakarta EE 11: Beyond the Era of Java EE

Do you want your ad here?

General Build Distribution: A Game-Changer or a Gimmick?

Most Important Scenarios to Optimize

Test Execution is the Bottleneck

Frequency of Incremental Builds vs. Full Rebuilds

Parallelization Factor

The Path Forward

Summary

Feedback

Kyle Moore

Kyle Moore

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Stable, Secure, and Affordable Java

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Do you want your ad here?

Related Articles

Comments (0)

Set Event Reminder

Subscribe to foojay updates:

Share with