Tutorials

Enter the Cloud Native Dojo: Blackbelt-Level Debugging

February 22, 2022
4 min read

Likes ...

Comments ...

Table of Contents

1. Polyglot Debugging2. Debugging the Unreproducible3. Data PollutionTL;DR

Debugging is often viewed as an art form or a craft. This is true for most engineering-related troubleshooting processes (e.g., art of motorcycle maintenance). We’re usually indoctrinated into the basic moves by a senior developer and are then thrown into the proverbial pool. As a result, even senior engineers sometimes have gaps in their debugging skills. There are very few university courses or books on the subject, so it’s really hard to blame them.

In his book, “Why Programs Fail — A Guide to Systemic Debugging”, Andreas Zeller told a story from his youth working at a computer store. A customer walked into the store with a new Commodore 64 computer. For context: The computers back then booted directly to a basic interpreter; basic would accept line numbers as the first argument. He tried inputting this valid basic line:

10 print “Hello World”

And he got a syntax error. He was surprised since the program seemed correct and it isn’t exactly complex. You can probably understand it without knowing any Basic…

In debugging and in programming in general, we need to break down a problem into smaller components. So he typed in:

An empty statement.

It turned out that the user was used to typewriters where he’d type a lowercase L and the letter O to type the numbers one and zero. He followed the same practice on the computer and just typed “lo.”

When I read that story, I laughed out loud, but I also thought: this isn’t a story about debugging. But it is, debugging is about the unexpected. It’s about narrowing (or slicing) the problem until we have a distilled problem we can observe. At this point, the solution presents itself to us.

In this article, I want to go over three big challenges we face when debugging modern applications:

Polyglot Debugging
Debugging the Unreproducible
Data Pollution

1. Polyglot Debugging

This isn’t a new problem. As a person who used to build JVMs for a living, I would occasionally “meta debug”: debugging the debug support for the JVM. That sent me between Java and native code with both debuggers running and stepping through.

That’s to be expected when building low-level VM code. But it’s something that’s becoming more common across the board. A server might be written in Python or Java with a JavaScript front end. We might track an issue through a frontend debugger all the way to the backend.

Similarly in a microservice deployment, each service might be implemented in a different language. In theory, we can test everything in isolation. In practice, that’s just unrealistic. Bugs happen. By their definition, they are unexpected.

With serverless, this problem has become even worse. Reproducing a serverless environment locally is so challenging we hear arguments that local debugging serverless is an anti-pattern.

Remote debugging is problematic, risky, and it can’t scale for complex deployments. So a lot of developers limit themselves to logging and maybe some observability tools. While that can help with some problems, these are poor replacements to local debugging. Continuous observability tools provide us with a way to go beyond simple monitoring. We can get source-level debugging similar to traditional debuggers on production servers.

2. Debugging the Unreproducible

There are two types of unreproducible bugs: those we can’t reproduce locally and those we can’t reproduce at all. If we can reproduce the issue in production, we can use a continuous observability tool to inspect the server.

However, if we can’t, we’re effectively stuck with log and observability analysis. We end up looking through forensic information like police crime scene investigators. At this point, it’s a bit late to do something, so we need to make sure we have new logs in place for the “next time” this happens.

As developers, we need to fight the culture of closing bugs with “can’t reproduce.” This is a cop-out. A can’t-reproduce scenario should add logs or similar guards to verify assumptions held by the developer. That way, we won’t get caught again with the can’t reproduce conundrum.

3. Data Pollution

We often think of bugs as failures, crashes and downtime. While those are indeed bad, they are often the best bugs. We know there’s a problem and the solution is usually obvious and immediate.

Data pollution is insidious. It’s remarkably hard to debug and incredibly hard to fix after the fact as fixing the code just isn’t enough.

So what is a data pollution bug?

This is a bug that results in bad data. That in itself isn’t a big problem… The problem is that this data propagates possibly between microservices and into the database. At this point, it becomes a huge problem. The bad data is a problem, but the bug that caused it can be anywhere, even on a different server. It’s like looking for a needle in a haystack. These bugs are particularly insidious since they often only occur in production and the cleanup after them could be worse than the problem itself.

A good example of this is “undefined,” which pollutes databases everywhere as it propagates from bad JavaScript code and somehow wiggles its way into databases. The way these are usually debugged after the fact is by placing a stack log in the place that writes or sends the data. Use a condition there to verify that this is indeed invalid data and detect that violation.

This can be done with code and also with continuous observability tools such as Lightrun.

TL;DR

Debugging is a skill that we use daily but still don’t invest enough time honing. We end up using the same tools and techniques over and over. We fall back to using logs and don’t use sophisticated capabilities that have been around for years. Unfortunately, bugs aren’t standing still. As we scale our infrastructure with amazing container technology, the bugs scale with our distributed solutions. They become more insidious at scale.

We need new tools and new technologies to handle the bug scalability in the same way we handle container scaling.

Originally published at The New Stack.

February 22, 2022
4 min read

Likes ...

Comments ...

Tutorials

Shai Almog

Author

Author, DevRel, Blogger, Open Source Hacker, Java Rockstar, Conference Speaker, Instructor and Entrepreneur.

(Semantic) Versioning your Java libraries

AWS Nitro and CPU Graviton Meets Unikernels

The Java Story: A Film About All of Us

Project Panama for Newbies (Part 1)

A Week of Housekeeping: What Changed on Foojay.io

SpringBoot 3.2 + CRaC

I Asked GitHub Copilot to Profile a Java App. It Found a Bug in My Heap Sizing, and Offered to Fix It

Nulling Out References Won’t Help Your Garbage Collector

Getting Started with Deep Learning in Java Using Deep Netts

Spring: Internals of RestClient

foojay: A Place for Friends of OpenJDK

Dashboard for OpenJDK Update Release Details

JDK14: New Features and Enhancements

Fun with Flags: My Top 10 Resources for JVM Flags

Performance of Modern Java on Data-Heavy Workloads: Real-Time Streaming

Performance of Modern Java on Data-Heavy Workloads: Batch Processing

How does Java handle different Images and ColorSpaces – Part 1

How does Java handle different Images and ColorSpaces – Part 2

How does Java handle different Images and ColorSpaces – Part 3

How does Java handle different Images and ColorSpaces – Part 4

Indexing all of Wikipedia, on a laptop

Working with Multiple Carets in IntelliJ IDEA

Clean Shutdown of Spring Boot Applications

Project Panama for Newbies (Part 1)

Java 17 on the Raspberry Pi

How to Create Mobile Apps with JavaFX (Part 1)

Beginning JavaFX Applications with IntelliJ IDE

SpringBoot 3.2 + CRaC

Preparing for Spring Framework 7 and Spring Boot 4

Foojay Slack: bit.ly/join-foojay-slack

Free eBook: Sustainability for Java Developers

Cut Code Review Time & Bugs in Half. Instantly.

Standards Over Lock-In: Modernizing Java with Jakarta EE 11 on Azul Payara 7

Enter the Cloud Native Dojo: Blackbelt-Level Debugging

1. Polyglot Debugging

2. Debugging the Unreproducible

3. Data Pollution

TL;DR

Shai Almog

Shai Almog

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Digma

adesso

Trending

Free eBook: Sustainability for Java Developers

Cut Code Review Time & Bugs in Half. Instantly.

Standards Over Lock-In: Modernizing Java with Jakarta EE 11 on Azul Payara 7

Comments (0)

Free eBook: Sustainability for Java Developers

Cut Code Review Time & Bugs in Half. Instantly.

Standards Over Lock-In: Modernizing Java with Jakarta EE 11 on Azul Payara 7

Do you want your ad here?

Enter the Cloud Native Dojo: Blackbelt-Level Debugging

1. Polyglot Debugging

2. Debugging the Unreproducible

3. Data Pollution

TL;DR

Shai Almog

Shai Almog

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Digma

adesso

Trending

All 0 Likes

Free eBook: Sustainability for Java Developers

Cut Code Review Time & Bugs in Half. Instantly.

Standards Over Lock-In: Modernizing Java with Jakarta EE 11 on Azul Payara 7

Do you want your ad here?

Related Articles

Comments (0)

Set Event Reminder

Subscribe to foojay updates:

Share with