9 min read

Java 11 includes a lot of improvements and changes in the GC(Garbage Collection) domain. Z Garbage Collector (ZGC) is scalable, with low latency. It is a completely new GC, written from scratch. It can work with heap memory, ranging from KBs to a large TB memory. As a concurrent garbage collector, ZGC promises not to exceed application latency by 10 milliseconds, even for bigger heap sizes. It is also easy to tune. It was released with Java 11 as an experimental GC. Work is in progress on this GC in OpenJDK and more changes can be expected over time.

This article is an excerpt taken from the book, Java 11 and 12 – New Features, written by Mala Gupta. In this book, you will learn the latest developments in Java, right from variable type inference and simplified multithreading through to performance improvements, and much more.

In this article, you will understand the need of ZGC, its features, its working, ZGC heap, ZGC phases, and colored pointers.

Need for Z Garbage Collector

One of the features that resulted in the rise of Java in the early days was its automatic memory management with its GCs, which freed developers from manual memory management and lowered memory leaks. However, with unpredictable timings and durations, garbage collection can (at times) do more harm to an application than good. Increased latency directly affects the throughput and performance of an application. With ever-decreasing hardware costs and programs engineered to use largish memories, applications are demanding lower latency and higher throughput from garbage collectors.

ZGC promises a latency of no more than 10 milliseconds, which doesn’t increase with heap size or a live set. This is because its stop-the-world pauses are limited to root scanning.

Features of Z Garbage Collector

ZGC brings in a lot of features, which have been instrumental in its proposal, design, and implementation. One of the most outstanding features of ZGC is that it is a concurrent GC. Other features include:

  • It can mark memory and copy and relocate it, all concurrently. It also has a concurrent reference processor.
  • As opposed to the store barriers that are used by another HotSpot GCs, ZGC uses load barriers. The load barriers are used to keep track of heap usage.
  • One of the intriguing features of ZGC is the usage of load barriers with colored pointers. This is what enables ZGC to perform concurrent operations when Java threads are running, such as object relocation or relocation set selection.
  • ZGC is more flexible in configuring its size and scheme. Compared to G1, ZGC has better ways to deal with very large object allocations.
  • ZGC is a single-generation GC. It also supports partial compaction. ZGC is also highly performant when it comes to reclaiming memory and reallocating it.
  • ZGC is NUMA-aware, which essentially means that it has a NUMA-aware memory allocator.

Getting started with Z Garbage Collector

Working with ZGC involves multiple steps. The JDK binary should be installed, which is specific to Linux/x64, and build and start it. The following commands can be used to download ZGC and build it on your system:

$ hg clone http://hg.openjdk.java.net/jdk/jdk
$ cd zgc
$ sh configure --with-jvm-features=zgc
$ make images

After execution of the preceding commands, the JDK root directory can be found in the following location:

g./build/linux-x86_64-normal-server-release/images/jdk

Java tools, such as java, javac, and others can be found in the /bin subdirectory of the preceding path (its usual location).

Let’s create a basic HelloZGC class, as follows:

class HelloZGC { 
    public static void main(String[] args) { 
        System.out.println("Say hello to new low pause GC - ZGC!"); 
    } 
}

The following command can be used to enable ZGC and use it:

java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC HelloZGC

Since ZGC is an experimental GC, the user needs to unlock it using the runtime option, that is, XX:+UnlockExperimentalVMOptions.

For enabling basic GC logging, the user can add the -Xlog:gc option.

Detailed logging is helpful while fine-tuning an application. The user can enable it by using the -Xlog:gc* option  as follows:

java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:gc* HelloZGC

The previous command will output all the logs to the console, which could make it difficult to search for specific content. The user can specify the logs to be written to a file as follows:

java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:gc:mylog.log* HelloZGC

Z Garbage Collector heap

ZGC divides memory into regions, also called ZPages. ZPages can be dynamically created and destroyed. These can also be dynamically sized (unlike the G1 GC), which are multiples of 2 MB. Here are the size groups of heap regions:

  • Small (2 MB)
  • Medium (32 MB)
  • Large (N * 2 MB)

ZGC heap can have multiple occurrences of these heap regions. The medium and large regions are allocated contiguously, as shown in the following diagram:

Unlike other GCs, the physical heap regions of ZGC can map into a bigger heap address space (which can include virtual memory). This can be crucial to combat memory fragmentation issues. Imagine that the user can allocate a really big object in memory, but can’t do so due to unavailability of contiguous space in memory.

This often leads to multiple GC cycles to free up enough contiguous space. If none are available, even after (multiple) GC cycle(s), the JVM will shut down with OutOfMemoryError. However, this particular use case is not an issue with the ZGC. Since the physical memory maps to a bigger address space, locating a bigger contiguous space is feasible.

Z Garbage Collector phases

A GC cycle of ZGC includes multiple phases:

  • Pause Mark Start
  • Pause Mark End
  • Pause Relocate Start

In the first phase, Pause Mark Start, ZGC marks objects that have been pointed to by roots. This includes walking through the live set of objects, and then finding and marking them. This is by far one of the most heavy-duty workloads in the ZGC GC cycle.

Once this completes, the next cycle is Pause Mark Start, which is used for synchronization and starts with a short pause of 1 ms. In this second phase, ZGC starts with reference processing and moves to week-root cleaning. It also includes the relocation set selection. ZGC marks the regions it wants to compact.

The next step, Pause Relocate Start, triggers the actual region compaction. It begins with root scanning pointing into the location set, followed by the concurrent reallocation of objects in the relocation set.

The first phase, that is, Pause Mark Start, also includes remapping the live data. Since marking and remap of live data is the most heavy-duty GC operation, it isn’t executed as a separate one. Remap starts after Pause Relocate Start but overlaps with the Pause Mark Start phase of the next GC cycle.

Colored pointers

Colored pointers are one of the core concepts of ZGC. It enables ZGC to find, mark, locate, and remap the objects. It doesn’t support x32 platforms. Implementation of colored points needs virtual address masking, which could be accomplished either in the hardware, operating system, or software. The following diagram shows the 64-bit pointer layout:

As shown in the preceding diagram, the 64-bit object reference is divided as follows:

  • 18 bits: Unused bits
  • 1-bit: Finalizable
  • 1-bit: Remapped
  • 1-bit: Marked1
  • 1-bit: Marked0
  • 42 bits: Object Address

The first 18 bits are reserved for future use. The 42 bits can address up to 4 TB of address space. Now comes the remaining, intriguing, 4 bits. The Marked1 and Marked0 bits are used to mark objects for garbage collection. By setting the single bit for Remapped, an object can be marked not pointing to into the relocation set. The last 1-bit for finalizing relates to concurrent reference processing. It marks that an object can only be reachable through a finalizer.

When the user runs ZGC on a system, it will be notice that it uses a lot of virtual memory space, which is not the same as the physical memory space. This is due to heap multi-mapping. It specifies how the objects with the colored pointers are stored in the virtual memory.

As an example, for a colorless pointer, say, 0x0000000011111111, its colored pointers would be 0x0000100011111111 (remapped bit set), 0x0000080011111111 (Marked1 bit set), and 0x0000040011111111 (Marked0 bit set). The same physical heap memory would map to three different locations in address space, each corresponding to the colored pointer. This would be implemented differently when the mapping is handled differently.

Tuning Z Garbage Collector

To get the optimal performance,  a heap size must be set up, that can not only store the live set of your application but also has enough space to service the allocations.

ZGC is a concurrent garbage collector. By setting the amount of CPU time that should be assigned to ZGC threads, the user can control how often the GC kicks in. It can be done so by using the following option:

-XX:ConcGCThreads=<number> 

A higher value for the ConcGCThreads option will leave less amount of CPU time for your application. On the other hand, a lower value may result in your application struggling for memory; your application might generate more garbage than what is collected by ZGC. ZGC can also use default values for ConcGCThreads. To fine-tune your application on this parameter, you might prefer to execute against test values.

For advanced ZGC tuning, the user can also enable large pages for enhanced performance of your application. It can be done by using the following option:

-XX:+UseLargePages

Instead of enabling large pages, the user can also enable transparent huge pages by using the following option:

-XX:+UseTransparentHugePage

The preceding option also includes additional settings and configurations, which can be accessed by using ZGC’s official wiki page.

ZGC is a NUMA-aware GC. Applications executing on the NUMA machine can result in a noticeable performance gain. By default, NUMA support is enabled for ZGC. However, if the JVM realizes that it is bound to a subset in the JVM, this feature can be disabled. To override a JVM’s decision, the following option can be used:

-XX:+UseNUMA

Summary

We have briefly discussed the scalable, low latency GC for OpenJDK—ZGC. It is an experimental GC, which has been written from scratch. As a concurrent GC, it promises max latency to be less than 10 milliseconds, which doesn’t increase with heap size or live data. At present, it only works with Linux/x64. More platforms can be supported in the future if there is considerable demand for it.

To know more about the applicability of Java’s new features, head over to the book, Java 11 and 12 – New Features.

Read Next

Using lambda expressions in Java 11 [Tutorial]

Creating a simple modular application in Java 11 [Tutorial]

Java 11 is here with TLS 1.3, Unicode 11, and more updates

A born storyteller turned writer!