Working with JRockit Runtime Analyzer- A Sequel

Code

The Code tab group contains information from the code generator and the method sampler. It consists of three tabs —the Overview, Hot Methods, and Optimizations tab.

Overview

This tab aggregates information from the code generator with sample information from the code optimizer. This allows us to see which methods the Java program spends the most time executing. Again, this information is available virtually "for free", as the code generation system needs it anyway.

For CPU-bound applications, this tab is a good place to start looking for opportunities to optimize your application code. By CPU-bound, we mean an application for which the CPU is the limiting factor; with a faster CPU, the application would have a higher throughput.

working-jrockit-runtime-analyzer-sequel-img-0

In the first section, the amount of exceptions thrown per second is shown. This number depends both on the hardware and on the application—faster hardware may execute an application more quickly, and consequently throw more exceptions. However, a higher value is always worse than a lower one on identical setups. Recall that exceptions are just that, rare corner cases. As we have explained, theJVM typically gambles that they aren't occurring too frequently. If an application throws hundreds of thousands exceptions per second, you should investigate why. Someone may be using exceptions for control flow, or there may be a configuration error. Either way, performance will suffer.

In JRockit Mission Control 3.1, the recording will only provide information about how many exceptions were thrown. The only way to find out where the exceptions originated is unfortunately by changing the verbosity of the log.

An overview of where the JVM spends most of the time executing Java code can be found in the Hot Packages and Hot Classes sections. The only difference between them is the way the sample data from the JVM code optimizer is aggregated. In Hot Packages, hot executing code is sorted on a per-package basis and in Hot Classes on a per-class basis. For more fine-grained information, use the Hot Methods tab.

As shown in the example screenshot, most of the time is spent executing code in the weblogic.servlet.internal package. There is also a fair amount of exceptions being thrown.

Hot Methods

This tab provides a detailed view of the information provided by the JVM code optimizer. If the objective is to find a good candidate method for optimizing the application, this is the place to look. If a lot of the method samples are from one particular method, and a lot of the method traces through that method share the same origin, much can potentially be gained by either manually optimizing that method or by reducing the amount of calls along that call chain.

In the following example, much of the time is spent in the method com.bea.wlrt.adapter.defaultprovider.internal.CSVPacketReceiver.parseL2Packet(). It seems likely that the best way to improve the performance of this particular application would be to optimize a method internal to the application container (WebLogic Event Server) and not the code in the application itself, running inside the container. This illustrates both the power of the JRockit Mission Control tools and a dilemma that the resulting analysis may reveal—the answers provided sometimes require solutions beyond your immediate control.

working-jrockit-runtime-analyzer-sequel-img-1

Sometimes, the information provided may cause us to reconsider the way we use data structures. In the next example, the program frequently checks if an object is in a java. util.LinkedList. This is a rather slow operation that is proportional to the size of the list (time complexity O(n)), as it potentially involves traversing the entire list, looking for the element. Changing to another data structure, such as a HashSet would most certainly speed up the check, making the time complexity constant (O(1)) on average, given that the hash function is good enough and the set large enough.

working-jrockit-runtime-analyzer-sequel-img-2

Optimizations

This tab shows various statistics from the JIT-compiler. The information in this tab is mostly of interest when hunting down optimization-related bugs in JRockit. It shows how much time was spent doing optimizations as well as how much time was spent JIT-compiling code at the beginning and at the end of the recording. For each method optimized during the recording, native code size before and after optimization is shown, as well as how long it took to optimize the particular method.

working-jrockit-runtime-analyzer-sequel-img-3

Thread/Locks

The Thread/Locks tab group contains tabs that visualize thread- and lock-related data. There are five such tabs in JRA—the Overview, Thread, Java Locks, JVM Locks, and Thread Dumps tab.

Overview

The Overview tab shows fundamental thread and hardware-related information, such as the number of hardware threads available on the system and the number of context switches per second.

working-jrockit-runtime-analyzer-sequel-img-4

A dual-core CPU has two hardware threads, and a hyperthreaded core also counts as two hardware threads. That is, a dual-core CPU with hyperthreading will be displayed as having four hardware threads.

A high amount of context switches per second may not be a real problem, but better synchronization behavior may lead to better total throughput in the system.

There is a CPU graph showing both the total CPU load on the system, as well as the CPU load generated by the JVM. A saturated CPU is usually a good thing — you are fully utilizing the hardware on which you spent a lot of money! As previously mentioned, in some CPU-bound applications, for example batch jobs, it is normally a good thing for the system to be completely saturated during the run. However, for a standard server-side application it is probably more beneficial if the system is able to handle some extra load in addition to the expected one.

The hardware provisioning problem is not simple, but normally server-side systems should have some spare computational power for when things get hairy. This is usually referred to as overprovisioning, and has traditionally just involved buying faster hardware. Virtualization has given us exciting new ways to handle the provisioning problem.

Threads

This tab shows a table where each row corresponds to a thread. The tab has more to offer than first meets the eye. By default, only the start time, the thread duration, and the Java thread ID are shown for each thread. More columns can be made visible by changing the table properties. This can be done either by clicking on the Table Settings icon, or by using the context menu in the table.

As can be seen in the example screenshot, information such as the thread group that the thread belongs to, allocation-related information, and the platform thread ID can also be displayed. The platform thread ID is the ID assigned to the thread by the operating system, in case we are working with native threads. This information can be useful if you are using operating system-specific tools together with JRA.

working-jrockit-runtime-analyzer-sequel-img-5

Java Locks

This tab displays information on how Java locks have been used during the recording. The information is aggregated per type (class) of monitor object.

working-jrockit-runtime-analyzer-sequel-img-6

This tab is normally empty. You need to start JRockit with the system property jrockit.lockprofiling set to true, for the lock profiling information to be recorded. This is because lock profiling may cause anything from a small to a considerable overhead, especially if there is a lot of synchronization.

With recent changes to the JRockit thread and locking model, it would be possible to dynamically enable lock profiling. This is unfortunately not the case yet, not even in JRockit Flight Recorder.
For R28, the system property jrockit.lockprofiling has been deprecated and replaced with the flag -XX:UseLockProfiling.

JVM Locks

This tab contains information on JVM internal native locks. This is normally useful for the JRockit JVM developers and for JRockit support.

An example of a native lock would be the code buffer lock that the JVM acquires in order to emit compiled methods into a native code buffer. This is done to ensure that no other code generation threads interfere with that particular code emission.

Thread Dumps

The JRA recordings normally contain thread dumps from the beginning and the end of the recording. By changing the Thread dump interval parameter in the JRA recording wizard, more thread dumps can be made available at regular intervals throughout the recording.