A useful IBM Java Performance on Power 7 Best practice whitepaper .
Here are some highlights from system administrator’s perspective.
The Power Architecture® does not require running applications in 64-bit mode to achieve best
performance since 32-bit and 64-bit modes have the same number of processor registers.
For best performance, use 32-bit Java unless the memory requirement of the application requires
running in 64-bit mode, interesting!
Use 32-bit Java for workload that has heap size requirement less than 2.5GB and when
the application spawns a few hundred threads.
- Medium and Large Pages for Java Heap and Code Cache
Configure large pages(Dynamically configured 1GB of 16MB pages, -r option to make it permanently as well):
# vmo -r -o lgpg_regions=64 -o lgpg_size=16777216 # bosboot -a
Non-root users must have the CAP_BYPASS_RAC_VMM capability enabled to use large pages. The system administrator can add this capability using the chuser command like in the example below:
# chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE <user_id>
Scale JVM/WAS instances with an instance for every 2 –cores
Choosing the right SMT mode:
Most applications benefit from SMT. However, some applications do not scale with an increased number of logical CPUs on an SMT enabled system.
Using Resource Sets:
Resource sets (rsets) allow specifying on which logical CPUs an application can run. This is
useful when an application that doesn’t scale beyond a certain number of logical CPUs should
run on large LPAR. For example, an application that scales well up to 8 logical CPUs should run
on an LPAR that has 64 logical CPUs.
The following example demonstrates
how to use execrset to create an rset with CPU 4 to 7 and starts the application attached to it:
execrset -c 4-7 -e <application>
In addition to running the application attached to a rset, the MEMORY_AFFINITY
environment variable should be set to MCM to assure that the applications private and shared
memory get allocated from memory that is local to the logical CPUs of the rset:
In general, rsets should be created on core boundaries. For example, a system with four virtual
processors (cores) running in SMT4 mode will have 16 logical CPUs. Creating an rset with four
logical CPUs should be created by selecting four SMT threads that belong to one core. An rset
with eight logical CPUs ld be created by selecting eight SMT threads that belong to two cores.
The smtctl command can be used to determine which logical CPUs belong to which core:
smtctl This system is SMT capable. This system supports up to 4 SMT threads per processor. SMT is currently enabled. SMT boot mode is not set. SMT threads are bound to the same physical processor. proc0 has 4 SMT threads. Bind processor 0 is bound with proc0 Bind processor 1 is bound with proc0 Bind processor 2 is bound with proc0 Bind processor 3 is bound with proc0 proc4 has 4 SMT threads. Bind processor 4 is bound with proc4 Bind processor 5 is bound with proc4 Bind processor 6 is bound with proc4 Bind processor 7 is bound with proc4
The smtctl output above shows that the system is running in SMT4 mode with bind processor
(logical CPU) 0 to 3 belonging to proc0 and bind processors 4 to 7 to proc1. An rset with four
logical CPU should be created either for CPUs 0 to 3 or for CPUs 4 to 7.
Java Performance on POWER7 – Best Practice
POW03066-USEN-00.doc Page 10
To achieve best performance with rsets that are created across multiple cores, all cores of the rset
should be in the same scheduler resource allocation domain (SRAD). The lssrad command can
be used to determine which logical CPUs belong to which SRAD:
lssrad -av REF1 SRAD MEM CPU 0 0 22397.25 0-31 1 1 29801.75 32-63
The example output above shows a system that has two SRADs. CPUs 0 to 31 belong to the first
SRAD while CPUs 32 to 63 belong to the second SRAD. In this example, an rset with multiple
cores should be created either using the CPUs of the first or second SRAD.
Note: A user must have root authority or have CAP_NUMA_ATTACH capability to use rsets.
- Data Pre-fetching
For most Java applications, it is recommended to turn off HW data prefetch with AIX command
“dscrctl –n –s 1”. Java 6 SR7 exploits HW transient prefetch on POWER 7.
Full whitepaper can be downloaded here: