Friends of OpenJDK Today

Project Panama for Newbies (Part 2)

August 17, 2021

Author(s)

  • Carl Dea

    Carl Dea is a Senior Developer Advocate at Azul. He has authored Java books and has been developing software for 20+ years with many clients, from Fortune 500 companies to ... Learn more

Updated! (June 26, 2022) The article has been updated to use JEP 424 build 19-ea+25 (2022-09-20).

Introduction

Welcome back to Part 2 of Project Panama for Newbies! If you are new to this series, check out Part 1 first.

If you remember from Part 1, we learned how to create C language primitive data types and arrays. We also got a chance to iterate through the data outside of the Java heap and later display items via Java’s printf() method from the System.out object.

In Part 2, below, we will look at C language’s concept of pointers and structs. Later on in this article, we will use Panama to mimic these concepts. The goal is to call C function signatures that are often defined to accept pointers and structs.

For the impatient, check out the source code of Part 2 on GitHub. To see the prior code examples of this article using JEP 412 go to the branch here.

What is a C pointer?

Pointers explained according to the C Programming Language book by Brian W. Kernighan & Dennis M. Ritchie:

C supports the use of pointers, a type of reference that records the address or location of an object or function in memory. Pointers can be dereferenced to access data stored at the address pointed to, or to invoke a pointed-to function. Pointers can be manipulated using assignment or pointer arithmetic.

The C Programming Language book by Brian W. Kernighan & Dennis M. Ritchie

Before we look at the advantages of using pointers in the C language let's look at how a Java primitive value is stored and used in memory inside the JVM(Java Virtual Machine).

For example: int x = 5;

public static void main(String[] args) {   
   int x = 5;
   x = doubleIt(x); // x = 10
}

public static int doubleIt(int a) {
   return 2 * a;
}

In the Java language, there are two places to store things in memory, in the JVM heap and in the JVM stack. The heap is responsible for holding objects along with their primitive values. While inside a method the variables declared as primitive data types are stored in stack memory.

Note: The stack is also capable of storing references (memory addresses) to objects on the heap.

At first glance, the above example seems simple and straightforward, however did you know it takes up more memory when primitives are passed by value? When the variable x is declared and assigned in the main() method and subsequently passed into the doubleIt() method the value 5 is being copied (stored) which means the value 5 is created twice. Having said this, it shows that internally it allocates two addresses (each 64 bit) and two (32bit) space allocations in memory. Wouldn't it be nice to get the address (reference) of the variable x and allow the doubleIt() method to access the value at the same location without copying (passed by value)?

In the C the language you can declare variables that allow you to pass primative datatypes by reference. A function such as doubleIt() would not have to copy the value. Would instead obtain the value at the location referenced in memory. Let's look at a C program rewritten using pointers to be passed into the C function doubleIt().

Note: This section is optional to demonstrate the concepts of C pointers helping us use Panama to call functions passed by reference.

A file pointers.c contains the code below:

#include <stdio.h>

int doubleIt(int *a);

int main () {
   int x = 5;
   int *ptr; // 1. Declare a pointer of type int.
   ptr = &x; // 2. Assign a pointer variable to the address of x.

   // Display locations in memory
   printf("                                    Address of x variable: %x\n", &x );
   printf("                           Address stored in ptr variable: %x\n", ptr );

   // Call doubleIt() by reference
   printf("               Address of the variable x. Call doubleIt(): %d\n", doubleIt(&x) );
   printf("Pointer to the address of the variable x. Call doubleIt(): %d\n", doubleIt(ptr) );

}

/**
 * Returns a value doubled.
 * @param *a pointer to an int
 * @return int doubling of a value.
 */
int doubleIt(int *a) {
   return 2 * (*a); // two times the value at address (of pointer a).
}

To compile pointers.c file use the following:

$ clang -o pointers_exe pointers.c

To run the executable file type the following:

$ ./pointers_exe

The output is the following:

                                    Address of x variable: e36584dc
                           Address stored in ptr variable: e36584dc
               Address of the variable x. Call doubleIt(): 10
Pointer to the address of the variable x. Call doubleIt(): 10

In the example a you will notice the output showing the actual address in memory for &x and ptr. The last two output lines show how to pass parameters to C functions by reference as opposed to by value.

How does it work?

The table below shows the detailed steps of the C program file pointers.c.

LineCodeDescription
7int *ptr;Declare a variable ptr pointer of type int.
To declare pointers the format is: <type> *<variable_name>;
8ptr = &x;Assign ptr to the address of the variable x.
To assign pointers the format is: <pointer_var_name>=&<other_variable>
Think '&' means get address of.
11-12Output addressprintf("%x", &x); Show hex of the memory address location
printf("%x", ptr); Same but without &, ptr contains address.
15-16Call
doubleIt()
printf("%d", doubleIt(&x)); Call function by reference (address).
printf("%d", doubleIt(ptr)); Same but without &, ptr contains address.
25Function signature int doubleIt(int *a); Like line 7, the declaration of a pointer.
26Get value from addressreturn 2 * (*a); In parenthesis how to obtain the actually value.
Think of '*' means get value from.

To keep confusion to a minimum keep the following in mind:

  • Declaring pointers - Prefix an asterisk * symbol to variable.
  • Assigning pointers - Obtain address by prefixing & symbol assigning variable of the same data type.
  • Defining function parameters - Prefix an asterisk * symbol to variable.
  • Accessing Values from pointers - Obtain a value by prefixing an asterisk * symbol. Use parens to make it clear. The notion as 'dereferencing a pointer'.

Now that we know how to talk to a C function that accepts a variable by reference let's look at how to perform this in Panama.

C Pointers Panama-fied

Whenever you think of a C pointer think of it as just an address location in memory, that stores data (in bytes). Since Pointers point to data in memory, how do you know how much data to retrieve?

At its core Panama is capable of modeling primitives and complex datatypes using the classes ValueLayout or MemoryLayout respectively. Remember in Part 1 we used the MemorySession to allocate a JAVA_INT (ValueLayout) that further creates a MemorySegment instance. To mimic or simulate the concept of a C pointer, the MemorySegment has an address() method that returns a MemoryAddress instance. The listing below shows how to mimic C's concept of pointers in Java.

// int x = 5;
MemorySegment x = memorySession.allocate(C_INT, 5);

// int *ptr;
MemoryAddress address = x.address();

To dereference (accessing values from pointers) pointers similar to C you need to know the offset (size) to properly retrieve the bytes at a given address location. New in JEP419 is get and set methods to access value types. Also new are predefined C primitive ValueLayout types that automatically know the byte sizes. For example if the variable x is of type int from x's location in memory the code will grab 4 bytes. If it's of type long it'll grab 8 bytes. Below is an example of how to reference and dererence a pointer:

// ptr = &x; represents a pointer of type int
MemoryAddress ptr = address; 

// (*ptr)  retrieve value from address.
x.get(C_INT, 0)

The listing below explains a full example of mimicking C's concept of pointers. Similar to code snippets above, we can create variables and pointer references. The code will also change the value of the variable x and output the value that ptr is pointing to (address location of x).

try (var memorySession = MemorySession.newConfined()) {
  System.out.println("\nCreating Pointers:");

  // int x = 5;
  var x = memorySession.allocate(C_INT, 5);

  // int *ptr;
  MemoryAddress address = x.address();             // obtain address

  // ptr = &x;
  MemoryAddress ptr = address;

  // Output value: x = 5 and ptr's value = 5
  System.out.printf("           x = %d    address = %x %n", x.get(C_INT, 0), x.address().toRawLongValue());
  System.out.printf(" ptr's value = %d    address = %x %n", ptr.get(C_INT, 0), ptr.address().toRawLongValue());

  // Change x = 10;
  x.set(C_INT, 0, 10);
  System.out.printf(" Changing x's value to: %d %n", x.get(C_INT, 0));

  // Output after change
  System.out.printf("           x = %d    address = %x %n", x.get(C_INT, 0), x.address().toRawLongValue());
  System.out.printf(" ptr's value = %d    address = %x %n", ptr.get(C_INT, 0), ptr.address().toRawLongValue());
}

The output of listing above:

           x = 5    address = 7fedece135e0 
 ptr's value = 5    address = 7fedece135e0 
 Changing x's value to: 10 
           x = 10    address = 7fedece135e0 
 ptr's value = 10    address = 7fedece135e0 

Now that you know how to deal with pointers to primitive types let's look at complex datatypes better known as C's concept of structs.

What is a C struct?

To put it simply, this is the ancestor to Java's concept of classes or records. If you would like to go deeper into a detailed explanation such as the history of C structs, etc... head over to Wikipedia.

Let's explore C language's struct. Below is a simple example of a struct Point containing x and y coordinates.

#include <stdio.h>

struct Point {
  int x;
  int y;
};

int main () {
   struct Point pt;
   pt.x = 100;
   pt.y = 50;
   printf("Point pt = (%d, %d) \n",  pt.x, pt.y);
}

The output is the following:

Point pt = (100, 50)

In the above example you will notice the keyword struct is used to define complex datatypes. In this scenario a Point is defined as two int variables named x and y. To declare a variable of type point the keyword is also specified or prefixed i.e. struct Point pt;.

To assign values to a struct instance, it is similar to Java, where the dot is used to access the attribute.

pt.x = 100;
pt.y = 50;

An interesting thing to note that in C there isn't the keyword "new" like in Java. Actually, in C++ it introduces the keyword new.

C Structs Panama-fied

Now that we know how things work in the C world, let's look at how to mimic C's concept of structs in Java Panama. To create C language's struct using Panama, we'll be invoking the static method MemoryLayout.structLayout(). This method creates an object of type GroupLayout. A GroupLayout object will describe a memory layout similar to the Point struct defined in C above. The method accepts ValueLayout and other MemoryLayout instances such as C_INT variables used for x and y coordinates of the Point struct. Shown below is how to create one C Point struct.

GroupLayout pointStruct = MemoryLayout.structLayout(
   C_INT.withName("x"),
   C_INT.withName("y")
);

var cPoint = memorySession.allocate(pointStruct);

Next, we need to set and get values from the cPoint instance. Below we use the method varHandle() to describe the path to the bytes in memory. I will describe it in more detail later, but for now think of it as a way to walk through memory to set and get data based on a memory layout.

VarHandle VHx = pointStruct.varHandle(MemoryLayout.PathElement.groupElement("x"));
VarHandle VHy = pointStruct.varHandle(MemoryLayout.PathElement.groupElement("y"));

VHx.set(cPoint, 100); // MemorySegment, int
VHy.set(cPoint, 200); 

System.out.printf("cPoint = (%d, %d) \n",  VHx.get(cPoint), VHy.get(cPoint));

This will output the following:

cPoint = (100, 200)

What's a java.lang.invoke.VarHandle?

According to the Javadoc documentation:

A VarHandle is a dynamically strongly typed reference to a variable, or to a parametrically-defined family of variables, including static fields, non-static fields, array elements, or components of an off-heap data structure. Access to such variables is supported under various access modes, including plain read/write access, volatile read/write access, and compare-and-set.

Javadoc documentation

In layman's (newbie) terms a VarHandle can reference and access sequences, structs and fields from memory. This API has been used beginning with Java 9. The goal of VarHandle was to define a standard way to invoke the equivalents of variousjava.util.concurrent.atomic and sun.misc.Unsafe operations.

Getting back to structs, Let's learn how to create a sequence of them!

Sequence of Structs

Before we look at how to create an array of structs let's look at how to define them in C. Below is a sequence or an array of 5 Point structs. The array variable is named points.

struct Point {
  int x;
  int y;
} points[5];

To iterate over an array of structs the following code sets and gets data.

// sets data
for (int i=0; i<5; i++) {
  points[i].x = 100 + i;
  points[i].y = 200 + i;
}

// gets data
for (int i=0; i<5; i++) {
  printf("Point pt = (%3d, %3d) \n",  points[i].x, points[i].y);
}

Output is the following:

Point pt = (100, 50) 
Point pt = (100, 200) 
Point pt = (101, 201) 
Point pt = (102, 202) 
Point pt = (103, 203) 
Point pt = (104, 204) 

Now that you know how to declare, create and access an array of structs in C, let's look at how to create a sequence of struct instances in Java Panama. To create a sequence of structs in Panama you will need the handy method MemoryLayout.sequenceLayout().

Again, these methods help you create MemoryLayout objects responsible for describing how space should be allocated in memory. The code snipet below creates a memory layout (SequenceLayout) ready for the allocator.

SequenceLayout seqStruct = MemoryLayout.sequenceLayout(5, pointStruct);

The seqStruct describes a memory layout as a sequence of 5 Point structs. Notice that the code reuses the already defined pointStruct instance defined earlier (of type GroupLayout). Now, let's allocate the space in memory.

MemorySegment points = memorySession.allocate(seqStruct);

Similar to using VarHandle and PathElements to access variables in memory (getters/setters) the code below creates a VarHandle instance that is able to access the sequence of structs and their x and y fields in memory:

var VHSeq_x = seqStruct.varHandle(
                MemoryLayout.PathElement.sequenceElement(),
                MemoryLayout.PathElement.groupElement("x"));
var VHSeq_y = seqStruct.varHandle(
                MemoryLayout.PathElement.sequenceElement(),
                MemoryLayout.PathElement.groupElement("y"));

Now we can iterate through the sequence to set point instances and their x and y coordinates. The code listing below uses a random number generator to supply values to be set for coordinates (x, y).

Random random = new Random();
for(long i=0; i<seqStruct.elementCount().getAsLong(); i++) {
  VHSeq_x.set(points, i, random.nextInt(100)); // MemorySegment, index i, int
  VHSeq_y.set(points, i, random.nextInt(100));
}

To output the contents of the sequence of Point structs the following code will invoke the VarHandle's get method as shown below:

for(long i=0; i<seqStruct.elementCount().getAsLong(); i++) {
  System.out.printf(" points[%d] = (%2d, %3d) \n", i, VHSeq_x.get(points, i), VHSeq_y.get(points, i));
}

The output is the following:

 points[0] = (30,  50) 
 points[1] = (92,  59) 
 points[2] = (44,  31) 
 points[3] = (43,  80) 
 points[4] = (55,  12) 

There you have it, C pointers and C structs in Java!

Conclusion

In Part 2, above, we got a chance to create (mimic) C's concept of pointers.

Next, we learned about memory layouts and how they define a struct using MemoryLayout.structLayout().

After an example of accessing a struct, we examined the important VarHandle class since Java 9.

Lastly, we were able to create and access a sequence of structs using the method MemoryLayout.sequenceLayout().

While this may be elementary to some, it's important to know how to model C's concepts to call into library functions that require variables of type pointers and structs.

The next installment (Part 3), is about using the knowledge we've attained so far to call functions in 3rd party libraries.

Topics:

Related Articles

View All

Author(s)

  • Carl Dea

    Carl Dea is a Senior Developer Advocate at Azul. He has authored Java books and has been developing software for 20+ years with many clients, from Fortune 500 companies to ... Learn more

Comments (2)

Your email address will not be published. Required fields are marked *

Highlight your code snippets using [code lang="language name"] shortcode. Just insert your code between opening and closing tag: [code lang="java"] code [/code]. Or specify another language.

Save my name, email, and website in this browser for the next time I comment.

Joao

In the ‘C Pointers Panama-fied’ section in the second code snipped. I think you wanted to write:

MemoryAdress ptr = address;

instead of:

Memorysegment ptr = address

?

Carl Dea

Good catch. Thank you!
Will fix.

Subscribe to foojay updates:

https://foojay.io/feed/
Copied to the clipboard