Do you want your ad here?

Contact us to get your ad seen by thousands of users every day!

[email protected]

Efficient Memory Mapping for Terabyte Sparse Files in Java

  • March 15, 2023
  • 2506 Unique Views
  • 3 min read
Table of Contents

On Linux, you can create sparse files, where only the pages (of 4 KiB) that are touched utilise either memory or disk space.

This allows you to memory map large virtual regions without worrying about wasted memory or disk

In this program, you can see it reserves 8 TiB (8,192 GiB)


Figure 1. Test 1: Sparse file

Tip: x << y means x × 2y therefore:

1L << 10 = 1 KiB (1024 bytes),
1L << 20 = 1 MiB (10242 bytes),
1L << 30 = 1 GiB (10243 bytes),
1L << 40 = 1 TiB etc

Using multiples of 10 for the shift makes them easier to read.

64L << 20 is 64 × 220 = 64 × 10242 = 64 MiB.

The virtual memory size of the above process is just over 8192 GiB at 8200.7 GiB, but the RSS (Resident Set Size) is only 122,060 KB, or 122 MB.

Figure 2. RES for Test 1

On disk, the extents reported are 8 TiB, however the amount of disk (and memory) actually used is just 20 KiB.


Figure 3. Disk usage for Test 1

The following test displays the main point of this article more clearly. In the test the reserved virtual memory is 8 TiB again but data has been written sparsely; 1000 integers are written but there is 16L << 10 (16 KiB = four pages) skip after each write.


Figure 4. Test 2: Sparse file with skipped pages

The RSS (Resident Set Size) is only 129,272 KB, or 122 MB and the disk usage is only 4.0 MiB which indicates that only touched pages use memory. Although it seems the size of data is 16 KiB 1000 = 16 MiB but only 1 out of 4 pages have been touched so the actual disk usage is 4KiB 1000 = 4.0 MiB


Figure 5. RES for Test 2


Figure 6. Disk usage for Test 2

Conclusion

Mapping large areas of memory avoids having to know in advance how much memory we need or having to resize the memory mappings while in use, while accessing the data as direct memory without the overhead of system calls.

In short, using virtual memory, instead of real memory, gives greater flexibility to how we tune our systems.

Files that can be pruned lazily make it clear the files won’t be extended.

In memory mapped files, only the touched pages use disk space.

On the system used for the tests in this article each page can hold 4 KiB data space hence writing data sparsely so that some pages were skipped did not increase disk usage, in other words only the touched pages contributed to memory demand.

Do you want your ad here?

Contact us to get your ad seen by thousands of users every day!

[email protected]

Comments (0)

Highlight your code snippets using [code lang="language name"] shortcode. Just insert your code between opening and closing tag: [code lang="java"] code [/code]. Or specify another language.

No comments yet. Be the first.

Subscribe to foojay updates:

https://foojay.io/feed/
Copied to the clipboard