Friends of OpenJDK Today

Efficient Memory Mapping for Terabyte Sparse Files in Java

March 15, 2023

Author(s)

  • Peter Lawrey

    Peter Lawrey is a Java Champion and Oracle Code One alumnus. Peter likes to inspire developers to improve the craftsmanship of their solutions and his popular blog “Vanilla Java” has ... Learn more

On Linux, you can create sparse files, where only the pages (of 4 KiB) that are touched utilise either memory or disk space.

This allows you to memory map large virtual regions without worrying about wasted memory or disk

In this program, you can see it reserves 8 TiB (8,192 GiB)


Figure 1. Test 1: Sparse file

Tip: x << y means x × 2y therefore:

1L << 10 = 1 KiB (1024 bytes),
1L << 20 = 1 MiB (10242 bytes),
1L << 30 = 1 GiB (10243 bytes),
1L << 40 = 1 TiB etc

Using multiples of 10 for the shift makes them easier to read.

64L << 20 is 64 × 220 = 64 × 10242 = 64 MiB.

The virtual memory size of the above process is just over 8192 GiB at 8200.7 GiB, but the RSS (Resident Set Size) is only 122,060 KB, or 122 MB.

Figure 2. RES for Test 1

On disk, the extents reported are 8 TiB, however the amount of disk (and memory) actually used is just 20 KiB.


Figure 3. Disk usage for Test 1

The following test displays the main point of this article more clearly. In the test the reserved virtual memory is 8 TiB again but data has been written sparsely; 1000 integers are written but there is 16L << 10 (16 KiB = four pages) skip after each write.


Figure 4. Test 2: Sparse file with skipped pages

The RSS (Resident Set Size) is only 129,272 KB, or 122 MB and the disk usage is only 4.0 MiB which indicates that only touched pages use memory. Although it seems the size of data is 16 KiB 1000 = 16 MiB but only 1 out of 4 pages have been touched so the actual disk usage is 4KiB 1000 = 4.0 MiB


Figure 5. RES for Test 2


Figure 6. Disk usage for Test 2

Conclusion

Mapping large areas of memory avoids having to know in advance how much memory we need or having to resize the memory mappings while in use, while accessing the data as direct memory without the overhead of system calls.

In short, using virtual memory, instead of real memory, gives greater flexibility to how we tune our systems.

Files that can be pruned lazily make it clear the files won’t be extended.

In memory mapped files, only the touched pages use disk space.

On the system used for the tests in this article each page can hold 4 KiB data space hence writing data sparsely so that some pages were skipped did not increase disk usage, in other words only the touched pages contributed to memory demand.

Topics:

Related Articles

View All

Author(s)

  • Peter Lawrey

    Peter Lawrey is a Java Champion and Oracle Code One alumnus. Peter likes to inspire developers to improve the craftsmanship of their solutions and his popular blog “Vanilla Java” has ... Learn more

Comments (0)

Your email address will not be published. Required fields are marked *

Highlight your code snippets using [code lang="language name"] shortcode. Just insert your code between opening and closing tag: [code lang="java"] code [/code]. Or specify another language.

Save my name, email, and website in this browser for the next time I comment.

Subscribe to foojay updates:

https://foojay.io/feed/
Copied to the clipboard