Do you want your ad here?

Contact us to get your ad seen by thousands of users every day!

[email protected]

Preventing YAML Parsing Vulnerabilities in Java

  • May 06, 2021
  • 2524 Unique Views
  • 2 min read
Table of Contents
Billion laughs attack

YAML is a human-readable language to serialize data that’s commonly used for config files. The word YAML is an acronym for “YAML ain’t a markup language” and was first released in 2001. You can compare YAML to JSON or XML as all of them are text-based structured formats.

YAML files are often used to configure applications, application servers, or clusters. It is a very common format in Spring Boot applications and, of course, to configure Kubernetes. However, similarly to JSON and XML, you can use YAML to serialize and deserialize data.

Although YAML looks like an excellent alternative for XML and JSON, many people aren’t a big fan of the structure. Since the language is line-based and uses indentation to represent structure and nesting, indentation often causes problems when parsing complex data structures. A single missing (or extra) whitespace in a complex, data-heavy structure will cause failures when parsing YAML. This causes unexpected problems, and finding the problem in a YAML file is difficult.

Most importantly to note, manually importing YAML in your Java application with an outdated version of snakeyaml might get you into trouble.

Billion laughs attack

One feature of YAML is that you can create anchors. You can reuse these anchors in different places so you do not have to repeat yourself. In the simplified example below, I create two variables: var1 and var2. By using anchors, var2 has the same value as var1.

var1: &anchor value
var2: *anchor

Let’s take this to the extreme and create the famous billion laughs attack for YAML. By applying this concept in a nested way, I can actually make a billion laughs.

lol1: &lol1 ["lol","lol","lol","lol","lol","lol","lol","lol","lol"]
lol2: &lol2 [*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1]
lol3: &lol3 [*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2]
lol4: &lol4 [*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3]
lol5: &lol5 [*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4]
lol6: &lol6 [*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5]
lol7: &lol7 [*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6]
lol8: &lol8 [*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7]
lol9: &lol9 [*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8]
lolz: &lolz [*lol9]

As you can see, lol1 is a list of 10 strings "lol". The variable lol2 is a list of 10 times lol1. By repeating this principle several times, we end up with lolz = 10^9 times "lol". Better said, a billion laughs.

With anchors, you can create a YAML bomb! The tremendous amount of (nested) objects that such a YAML input creates will cause a memory overload.

Please read the full article to get a breakdown of how this attack works with actual Java examples and more importantly how to prevent this problem in your Java application.

Do you want your ad here?

Contact us to get your ad seen by thousands of users every day!

[email protected]

Comments (2)

Highlight your code snippets using [code lang="language name"] shortcode. Just insert your code between opening and closing tag: [code lang="java"] code [/code]. Or specify another language.

Stephan avatar

Stephan

3 weeks ago

Hey, you might consider adding these other VoxxedDays events too for 2025 @ https://events.voxxeddays.com #Thanks

26

Highlight your code snippets using [code lang="language name"] shortcode. Just insert your code between opening and closing tag: [code lang="java"] code [/code]. Or specify another language.

Geertjan avatar

Geertjan

3 weeks ago

Thanks, will do, Stephan!

10

Highlight your code snippets using [code lang="language name"] shortcode. Just insert your code between opening and closing tag: [code lang="java"] code [/code]. Or specify another language.

Subscribe to foojay updates:

https://foojay.io/feed/
Copied to the clipboard