Updated! (June 26, 2022) The article has been updated to use JEP 424 build 19-ea+25 (2022-09-20).
Introduction
Welcome back to Part 2 of Project Panama for Newbies! If you are new to this series, check out Part 1 first.
If you remember from Part 1, we learned how to create C language primitive data types and arrays. We also got a chance to iterate through the data outside of the Java heap and later display items via Java’s printf()
method from the System.out
object.
In Part 2, below, we will look at C language’s concept of pointers and structs. Later on in this article, we will use Panama to mimic these concepts. The goal is to call C function signatures that are often defined to accept pointers and structs.
For the impatient, check out the source code of Part 2 on GitHub. To see the prior code examples of this article using JEP 412 go to the branch here.
What is a C pointer?
Pointers explained according to the C Programming Language book by Brian W. Kernighan & Dennis M. Ritchie:
C supports the use of pointers, a type of reference that records the address or location of an object or function in memory. Pointers can be dereferenced to access data stored at the address pointed to, or to invoke a pointed-to function. Pointers can be manipulated using assignment or pointer arithmetic.
The C Programming Language book by Brian W. Kernighan & Dennis M. Ritchie
Before we look at the advantages of using pointers in the C language let's look at how a Java primitive value is stored and used in memory inside the JVM(Java Virtual Machine).
For example: int x = 5;
public static void main(String[] args) { int x = 5; x = doubleIt(x); // x = 10 } public static int doubleIt(int a) { return 2 * a; }
In the Java language, there are two places to store things in memory, in the JVM heap and in the JVM stack. The heap is responsible for holding objects along with their primitive values. While inside a method the variables declared as primitive data types are stored in stack memory.
Note: The stack is also capable of storing references (memory addresses) to objects on the heap.
At first glance, the above example seems simple and straightforward, however did you know it takes up more memory when primitives are passed by value? When the variable x
is declared and assigned in the main() method and subsequently passed into the doubleIt()
method the value 5
is being copied (stored) which means the value 5 is created twice. Having said this, it shows that internally it allocates two addresses (each 64 bit) and two (32bit) space allocations in memory. Wouldn't it be nice to get the address (reference) of the variable x and allow the doubleIt()
method to access the value at the same location without copying (passed by value)?
In the C the language you can declare variables that allow you to pass primative datatypes by reference. A function such as doubleIt()
would not have to copy the value. Would instead obtain the value at the location referenced in memory. Let's look at a C program rewritten using pointers to be passed into the C function doubleIt()
.
Note: This section is optional to demonstrate the concepts of C pointers helping us use Panama to call functions passed by reference.
A file pointers.c
contains the code below:
#include <stdio.h> int doubleIt(int *a); int main () { int x = 5; int *ptr; // 1. Declare a pointer of type int. ptr = &x; // 2. Assign a pointer variable to the address of x. // Display locations in memory printf(" Address of x variable: %x\n", &x ); printf(" Address stored in ptr variable: %x\n", ptr ); // Call doubleIt() by reference printf(" Address of the variable x. Call doubleIt(): %d\n", doubleIt(&x) ); printf("Pointer to the address of the variable x. Call doubleIt(): %d\n", doubleIt(ptr) ); } /** * Returns a value doubled. * @param *a pointer to an int * @return int doubling of a value. */ int doubleIt(int *a) { return 2 * (*a); // two times the value at address (of pointer a). }
To compile pointers.c
file use the following:
$ clang -o pointers_exe pointers.c
To run the executable file type the following:
$ ./pointers_exe
The output is the following:
Address of x variable: e36584dc Address stored in ptr variable: e36584dc Address of the variable x. Call doubleIt(): 10 Pointer to the address of the variable x. Call doubleIt(): 10
In the example a you will notice the output showing the actual address in memory for &x
and ptr
. The last two output lines show how to pass parameters to C functions by reference as opposed to by value.
How does it work?
The table below shows the detailed steps of the C program file pointers.c
.
Line | Code | Description |
---|---|---|
7 | int *ptr; | Declare a variable ptr pointer of type int. To declare pointers the format is: <type> *<variable_name>; |
8 | ptr = &x; | Assign ptr to the address of the variable x . To assign pointers the format is: <pointer_var_name>=&<other_variable> Think ' & ' means get address of. |
11-12 | Output address | printf("%x", &x); Show hex of the memory address locationprintf("%x", ptr); Same but without &, ptr contains address. |
15-16 | Call doubleIt() | printf("%d", doubleIt(&x)); Call function by reference (address).printf("%d", doubleIt(ptr)); Same but without &, ptr contains address. |
25 | Function signature | int doubleIt(int *a); Like line 7, the declaration of a pointer. |
26 | Get value from address | return 2 * (*a); In parenthesis how to obtain the actually value. Think of ' * ' means get value from. |
To keep confusion to a minimum keep the following in mind:
- Declaring pointers - Prefix an asterisk
*
symbol to variable. - Assigning pointers - Obtain address by prefixing
&
symbol assigning variable of the same data type. - Defining function parameters - Prefix an asterisk
*
symbol to variable. - Accessing Values from pointers - Obtain a value by prefixing an asterisk * symbol. Use parens to make it clear. The notion as 'dereferencing a pointer'.
Now that we know how to talk to a C function that accepts a variable by reference let's look at how to perform this in Panama.
C Pointers Panama-fied
Whenever you think of a C pointer think of it as just an address location in memory, that stores data (in bytes). Since Pointers point to data in memory, how do you know how much data to retrieve?
At its core Panama is capable of modeling primitives and complex datatypes using the classes ValueLayout
or MemoryLayout
respectively. Remember in Part 1 we used the MemorySession
to allocate a JAVA_INT
(ValueLayout
) that further creates a MemorySegment
instance. To mimic or simulate the concept of a C pointer, the MemorySegment
has an address()
method that returns a MemoryAddress
instance. The listing below shows how to mimic C's concept of pointers in Java.
// int x = 5; MemorySegment x = memorySession.allocate(C_INT, 5); // int *ptr; MemoryAddress address = x.address();
To dereference (accessing values from pointers) pointers similar to C you need to know the offset (size) to properly retrieve the bytes at a given address location. New in JEP419 is get and set methods to access value types. Also new are predefined C primitive ValueLayout types that automatically know the byte sizes. For example if the variable x
is of type int
from x
's location in memory the code will grab 4 bytes. If it's of type long
it'll grab 8 bytes. Below is an example of how to reference and dererence a pointer:
// ptr = &x; represents a pointer of type int MemoryAddress ptr = address; // (*ptr) retrieve value from address. x.get(C_INT, 0)
The listing below explains a full example of mimicking C's concept of pointers. Similar to code snippets above, we can create variables and pointer references. The code will also change the value of the variable x
and output the value that ptr
is pointing to (address location of x
).
try (var memorySession = MemorySession.newConfined()) { System.out.println("\nCreating Pointers:"); // int x = 5; var x = memorySession.allocate(C_INT, 5); // int *ptr; MemoryAddress address = x.address(); // obtain address // ptr = &x; MemoryAddress ptr = address; // Output value: x = 5 and ptr's value = 5 System.out.printf(" x = %d address = %x %n", x.get(C_INT, 0), x.address().toRawLongValue()); System.out.printf(" ptr's value = %d address = %x %n", ptr.get(C_INT, 0), ptr.address().toRawLongValue()); // Change x = 10; x.set(C_INT, 0, 10); System.out.printf(" Changing x's value to: %d %n", x.get(C_INT, 0)); // Output after change System.out.printf(" x = %d address = %x %n", x.get(C_INT, 0), x.address().toRawLongValue()); System.out.printf(" ptr's value = %d address = %x %n", ptr.get(C_INT, 0), ptr.address().toRawLongValue()); }
The output of listing above:
x = 5 address = 7fedece135e0 ptr's value = 5 address = 7fedece135e0 Changing x's value to: 10 x = 10 address = 7fedece135e0 ptr's value = 10 address = 7fedece135e0
Now that you know how to deal with pointers to primitive types let's look at complex datatypes better known as C's concept of structs.
What is a C struct?
To put it simply, this is the ancestor to Java's concept of classes or records. If you would like to go deeper into a detailed explanation such as the history of C structs, etc... head over to Wikipedia.
Let's explore C language's struct
. Below is a simple example of a struct
Point
containing x
and y
coordinates.
#include <stdio.h> struct Point { int x; int y; }; int main () { struct Point pt; pt.x = 100; pt.y = 50; printf("Point pt = (%d, %d) \n", pt.x, pt.y); }
The output is the following:
Point pt = (100, 50)
In the above example you will notice the keyword struct
is used to define complex datatypes. In this scenario a Point is defined as two int
variables named x
and y
. To declare a variable of type point the keyword is also specified or prefixed i.e. struct Point pt;
.
To assign values to a struct instance, it is similar to Java, where the dot is used to access the attribute.
pt.x = 100; pt.y = 50;
An interesting thing to note that in C there isn't the keyword "new
" like in Java. Actually, in C++ it introduces the keyword new
.
C Structs Panama-fied
Now that we know how things work in the C world, let's look at how to mimic C's concept of structs in Java Panama. To create C language's struct
using Panama, we'll be invoking the static method MemoryLayout.structLayout()
. This method creates an object of type GroupLayout
. A GroupLayout
object will describe a memory layout similar to the Point
struct defined in C above. The method accepts ValueLayout
and other MemoryLayout
instances such as C_INT
variables used for x
and y
coordinates of the Point
struct. Shown below is how to create one C Point
struct.
GroupLayout pointStruct = MemoryLayout.structLayout( C_INT.withName("x"), C_INT.withName("y") ); var cPoint = memorySession.allocate(pointStruct);
Next, we need to set and get values from the cPoint
instance. Below we use the method varHandle()
to describe the path to the bytes in memory. I will describe it in more detail later, but for now think of it as a way to walk through memory to set and get data based on a memory layout.
VarHandle VHx = pointStruct.varHandle(MemoryLayout.PathElement.groupElement("x")); VarHandle VHy = pointStruct.varHandle(MemoryLayout.PathElement.groupElement("y")); VHx.set(cPoint, 100); // MemorySegment, int VHy.set(cPoint, 200); System.out.printf("cPoint = (%d, %d) \n", VHx.get(cPoint), VHy.get(cPoint));
This will output the following:
cPoint = (100, 200)
What's a java.lang.invoke.VarHandle?
According to the Javadoc documentation:
A
Javadoc documentationVarHandle
is a dynamically strongly typed reference to a variable, or to a parametrically-defined family of variables, including static fields, non-static fields, array elements, or components of an off-heap data structure. Access to such variables is supported under various access modes, including plain read/write access, volatile read/write access, and compare-and-set.
In layman's (newbie) terms a VarHandle
can reference and access sequences, structs and fields from memory. This API has been used beginning with Java 9. The goal of VarHandle
was to define a standard way to invoke the equivalents of variousjava.util.concurrent.atomic
and sun.misc.Unsafe
operations.
Getting back to structs, Let's learn how to create a sequence of them!
Sequence of Structs
Before we look at how to create an array of structs let's look at how to define them in C. Below is a sequence or an array of 5 Point
struct
s. The array variable is named points
.
struct Point { int x; int y; } points[5];
To iterate over an array of structs the following code sets and gets data.
// sets data for (int i=0; i<5; i++) { points[i].x = 100 + i; points[i].y = 200 + i; } // gets data for (int i=0; i<5; i++) { printf("Point pt = (%3d, %3d) \n", points[i].x, points[i].y); }
Output is the following:
Point pt = (100, 50) Point pt = (100, 200) Point pt = (101, 201) Point pt = (102, 202) Point pt = (103, 203) Point pt = (104, 204)
Now that you know how to declare, create and access an array of structs in C, let's look at how to create a sequence of struct instances in Java Panama. To create a sequence of structs in Panama you will need the handy method MemoryLayout.sequenceLayout()
.
Again, these methods help you create MemoryLayout
objects responsible for describing how space should be allocated in memory. The code snipet below creates a memory layout (SequenceLayout
) ready for the allocator.
SequenceLayout seqStruct = MemoryLayout.sequenceLayout(5, pointStruct);
The seqStruct
describes a memory layout as a sequence of 5 Point structs. Notice that the code reuses the already defined pointStruct
instance defined earlier (of type GroupLayout
). Now, let's allocate the space in memory.
MemorySegment points = memorySession.allocate(seqStruct);
Similar to using VarHandle
and PathElement
s to access variables in memory (getters/setters) the code below creates a VarHandle
instance that is able to access the sequence of structs and their x and y fields in memory:
var VHSeq_x = seqStruct.varHandle( MemoryLayout.PathElement.sequenceElement(), MemoryLayout.PathElement.groupElement("x")); var VHSeq_y = seqStruct.varHandle( MemoryLayout.PathElement.sequenceElement(), MemoryLayout.PathElement.groupElement("y"));
Now we can iterate through the sequence to set point instances and their x and y coordinates. The code listing below uses a random number generator to supply values to be set for coordinates (x, y)
.
Random random = new Random(); for(long i=0; i<seqStruct.elementCount().getAsLong(); i++) { VHSeq_x.set(points, i, random.nextInt(100)); // MemorySegment, index i, int VHSeq_y.set(points, i, random.nextInt(100)); }
To output the contents of the sequence of Point
structs the following code will invoke the VarHandle
's get method as shown below:
for(long i=0; i<seqStruct.elementCount().getAsLong(); i++) { System.out.printf(" points[%d] = (%2d, %3d) \n", i, VHSeq_x.get(points, i), VHSeq_y.get(points, i)); }
The output is the following:
points[0] = (30, 50) points[1] = (92, 59) points[2] = (44, 31) points[3] = (43, 80) points[4] = (55, 12)
There you have it, C pointers and C structs in Java!
Conclusion
In Part 2, above, we got a chance to create (mimic) C's concept of pointers.
Next, we learned about memory layouts and how they define a struct using MemoryLayout.structLayout()
.
After an example of accessing a struct, we examined the important VarHandle
class since Java 9.
Lastly, we were able to create and access a sequence of structs using the method MemoryLayout.sequenceLayout()
.
While this may be elementary to some, it's important to know how to model C's concepts to call into library functions that require variables of type pointers and structs.
The next installment (Part 3), is about using the knowledge we've attained so far to call functions in 3rd party libraries.
In the ‘C Pointers Panama-fied’ section in the second code snipped. I think you wanted to write:
MemoryAdress ptr = address;
instead of:
Memorysegment ptr = address
?