Skip to content

Improve cpu awareness on zos#23589

Open
VermaSh wants to merge 1 commit intoeclipse-openj9:masterfrom
VermaSh:improve_cpu_awareness_on_zos
Open

Improve cpu awareness on zos#23589
VermaSh wants to merge 1 commit intoeclipse-openj9:masterfrom
VermaSh:improve_cpu_awareness_on_zos

Conversation

@VermaSh
Copy link
Copy Markdown
Contributor

@VermaSh VermaSh commented Mar 26, 2026

Enhance z/OS CPU monitoring by utilizing system control structures to
directly retrieve CPU load metrics. This provides more accurate CPU
utilization data for JIT compilation decisions and verbose logging.

Key changes:

  • Add support for j9sysinfo_get_CPU_load() and j9sysinfo_get_CPU_capacity()
    port library APIs on z/OS to retrieve CPU metrics from control structures
  • Implement detailed CPU statistics logging via -Xjit:verbose={cpuStats}
    including zIIP count, capacity, GCP count, utilization percentages,
    and process timing information
  • Add cpuStatsPrintInterval option to control logging frequency (default: 1000ms)
  • Enable isJVMStarved flag support on z/OS using the new CPU load helpers
  • Print ASID in hexadecimal format for consistency with z/OS conventions
  • Update CpuUtilization to use direct CPU load retrieval on z/OS

This implementation leverages z/OS-specific control structures (CVT, CSD)
to provide accurate CPU metrics that account for zIIP processors, SMT
configurations, and the SVT normalization factor.

Signed-off-by: Shubham Verma shubhamv.sv@gmail.com

@VermaSh VermaSh force-pushed the improve_cpu_awareness_on_zos branch 5 times, most recently from f8bba00 to 4849a4d Compare March 30, 2026 23:00
@VermaSh VermaSh force-pushed the improve_cpu_awareness_on_zos branch 2 times, most recently from 192f8c3 to 1311f37 Compare April 9, 2026 13:27
@VermaSh
Copy link
Copy Markdown
Contributor Author

VermaSh commented Apr 9, 2026

Depends on eclipse-omr/omr#8190

@VermaSh
Copy link
Copy Markdown
Contributor Author

VermaSh commented Apr 9, 2026

Almost ready for review. We’re seeing the same performance gains as with the previous prototype build; however, I still need to launch test buckets for these changes. I'm currently unable to launch zOS builds due to infra issues but I have launched test builds for other platforms. I’ll remove the WIP label once I’ve confirmed that this PR doesn’t introduce any failures.

@VermaSh
Copy link
Copy Markdown
Contributor Author

VermaSh commented Apr 13, 2026

My personal build for non-z/OS platforms don't show any failures due to these changes. I tested sanity.functional, sanity.system, extended.functional and extended.system. There were 3 failures, 2 of which are already being tracked

MiniMix_extended_3h_0 failure:

[2026-04-09T18:06:48.825Z] LT  testStarted : testMultipleConnectSequential(net.adoptopenjdk.test.nio2.asyncio.client.MultipleConnectFutureTest)
[2026-04-09T18:06:48.825Z] LT  MultipleConnectFutureTest.testMultipleConnectSequential() creating 10 connections
[2026-04-09T18:06:48.825Z] LT  testFailure: testMultipleConnectSequential(net.adoptopenjdk.test.nio2.asyncio.client.MultipleConnectFutureTest): java.net.BindException: Can't assign requested address
[2026-04-09T18:06:48.825Z] LT  java.util.concurrent.ExecutionException: java.net.BindException: Can't assign requested address
[2026-04-09T18:06:48.825Z] LT  	at java.base/sun.nio.ch.CompletedFuture.get(CompletedFuture.java:78)
[2026-04-09T18:06:48.825Z] LT  	at net.adoptopenjdk.test.nio2.asyncio.client.MultipleConnectFutureTest.testMultipleConnectSequential(MultipleConnectFutureTest.java:83)
[2026-04-09T18:06:48.825Z] LT  	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
[2026-04-09T18:06:48.825Z] LT  	at java.base/java.lang.reflect.Method.invoke(Method.java:586)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestCase.runTest(TestCase.java:176)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestCase.runBare(TestCase.java:141)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestResult$1.protect(TestResult.java:122)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestResult.runProtected(TestResult.java:142)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestResult.run(TestResult.java:125)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestCase.run(TestCase.java:129)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestSuite.runTest(TestSuite.java:252)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestSuite.run(TestSuite.java:247)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:86)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.Suite.runChild(Suite.java:128)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.Suite.runChild(Suite.java:27)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
[2026-04-09T18:06:48.825Z] LT  	at net.adoptopenjdk.loadTest.adaptors.JUnitAdaptor.executeTest(JUnitAdaptor.java:130)
[2026-04-09T18:06:48.825Z] LT  	at net.adoptopenjdk.loadTest.LoadTestRunner$2.run(LoadTestRunner.java:182)
[2026-04-09T18:06:48.825Z] LT  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
[2026-04-09T18:06:48.825Z] LT  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
[2026-04-09T18:06:48.825Z] LT  	at java.base/java.lang.Thread.run(Thread.java:1600)
[2026-04-09T18:06:48.825Z] LT  Caused by: java.net.BindException: Can't assign requested address
[2026-04-09T18:06:48.825Z] LT  	at java.base/sun.nio.ch.Net.connect0(Native Method)
[2026-04-09T18:06:48.825Z] LT  	at java.base/sun.nio.ch.Net.connect(Net.java:601)
[2026-04-09T18:06:48.825Z] LT  	at java.base/sun.nio.ch.Net.connect(Net.java:590)
[2026-04-09T18:06:48.825Z] LT  	at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.implConnect(UnixAsynchronousSocketChannelImpl.java:336)
[2026-04-09T18:06:48.825Z] LT  	at java.base/sun.nio.ch.AsynchronousSocketChannelImpl.connect(AsynchronousSocketChannelImpl.java:200)
[2026-04-09T18:06:48.825Z] LT  	at net.adoptopenjdk.test.nio2.asyncio.client.MultipleConnectFutureTest.testMultipleConnectSequential(MultipleConnectFutureTest.java:82)[2026-04-09T18:06:48.825Z] LT  testStarted : testMultipleConnectSequential(net.adoptopenjdk.test.nio2.asyncio.client.MultipleConnectFutureTest)
[2026-04-09T18:06:48.825Z] LT  MultipleConnectFutureTest.testMultipleConnectSequential() creating 10 connections
[2026-04-09T18:06:48.825Z] LT  testFailure: testMultipleConnectSequential(net.adoptopenjdk.test.nio2.asyncio.client.MultipleConnectFutureTest): java.net.BindException: Can't assign requested address
[2026-04-09T18:06:48.825Z] LT  java.util.concurrent.ExecutionException: java.net.BindException: Can't assign requested address
[2026-04-09T18:06:48.825Z] LT  	at java.base/sun.nio.ch.CompletedFuture.get(CompletedFuture.java:78)
[2026-04-09T18:06:48.825Z] LT  	at net.adoptopenjdk.test.nio2.asyncio.client.MultipleConnectFutureTest.testMultipleConnectSequential(MultipleConnectFutureTest.java:83)
[2026-04-09T18:06:48.825Z] LT  	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
[2026-04-09T18:06:48.825Z] LT  	at java.base/java.lang.reflect.Method.invoke(Method.java:586)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestCase.runTest(TestCase.java:176)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestCase.runBare(TestCase.java:141)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestResult$1.protect(TestResult.java:122)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestResult.runProtected(TestResult.java:142)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestResult.run(TestResult.java:125)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestCase.run(TestCase.java:129)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestSuite.runTest(TestSuite.java:252)
[2026-04-09T18:06:48.825Z] LT  	at junit.framework.TestSuite.run(TestSuite.java:247)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:86)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.Suite.runChild(Suite.java:128)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.Suite.runChild(Suite.java:27)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
[2026-04-09T18:06:48.825Z] LT  	at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
[2026-04-09T18:06:48.825Z] LT  	at net.adoptopenjdk.loadTest.adaptors.JUnitAdaptor.executeTest(JUnitAdaptor.java:130)
[2026-04-09T18:06:48.825Z] LT  	at net.adoptopenjdk.loadTest.LoadTestRunner$2.run(LoadTestRunner.java:182)
[2026-04-09T18:06:48.825Z] LT  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
[2026-04-09T18:06:48.825Z] LT  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
[2026-04-09T18:06:48.825Z] LT  	at java.base/java.lang.Thread.run(Thread.java:1600)
[2026-04-09T18:06:48.825Z] LT  Caused by: java.net.BindException: Can't assign requested address
[2026-04-09T18:06:48.825Z] LT  	at java.base/sun.nio.ch.Net.connect0(Native Method)
[2026-04-09T18:06:48.825Z] LT  	at java.base/sun.nio.ch.Net.connect(Net.java:601)
[2026-04-09T18:06:48.825Z] LT  	at java.base/sun.nio.ch.Net.connect(Net.java:590)
[2026-04-09T18:06:48.825Z] LT  	at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.implConnect(UnixAsynchronousSocketChannelImpl.java:336)
[2026-04-09T18:06:48.825Z] LT  	at java.base/sun.nio.ch.AsynchronousSocketChannelImpl.connect(AsynchronousSocketChannelImpl.java:200)
[2026-04-09T18:06:48.825Z] LT  	at net.adoptopenjdk.test.nio2.asyncio.client.MultipleConnectFutureTest.testMultipleConnectSequential(MultipleConnectFutureTest.java:82)

@VermaSh
Copy link
Copy Markdown
Contributor Author

VermaSh commented Apr 13, 2026

My zoS test buckets didn't complete due to infra issues, I have relaunched them. Since I didn’t see any z/OS failures in my earlier testing, I’ll mark this PR as ready for review while the tests finish running.

@VermaSh VermaSh marked this pull request as ready for review April 13, 2026 22:29
@VermaSh VermaSh requested a review from dsouzai as a code owner April 13, 2026 22:29
@VermaSh
Copy link
Copy Markdown
Contributor Author

VermaSh commented Apr 13, 2026

@joransiu, @r30shah , @mpirvu can I please get a review for these changes?

@VermaSh
Copy link
Copy Markdown
Contributor Author

VermaSh commented Apr 13, 2026

looking into the Clang Format Check failure

@VermaSh VermaSh force-pushed the improve_cpu_awareness_on_zos branch 2 times, most recently from ab5ef00 to caff59e Compare April 13, 2026 23:13
@VermaSh VermaSh changed the title WIP: Improve cpu awareness on zos Improve cpu awareness on zos Apr 13, 2026
@VermaSh VermaSh force-pushed the improve_cpu_awareness_on_zos branch 2 times, most recently from e0819e0 to 883d0e5 Compare April 14, 2026 03:47
@VermaSh
Copy link
Copy Markdown
Contributor Author

VermaSh commented Apr 14, 2026

Format checks are passing, this is ready for review

Copy link
Copy Markdown
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick check through, changes overall looks ok to me, though I will review this together with OMR counter part, but small question,

Comment thread runtime/compiler/env/CpuUtilization.cpp Outdated
double cpuCapacity = 0.0;
if (0 == portLibraryStatusSys) {
portLibraryStatusSys = j9sysinfo_get_CPU_capacity(&cpuCapacity);
machineCpuStats->numberOfCpus = cpuCapacity;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is implicit casting here. I assume this numberOfCpus being used later on to calculate the load, can you confirm trunacting this value does not impact - i.e. this is throwing away the factor for SMT gain.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code to query CPU capacity every time instead of caching it in machineCpuStats->numberOfCpus. We set machineCpuStats->numberOfCpus to -1 as an indicator that it isn't valid. I chose this approach over changing the member’s type to avoid the added complexity of having platform-dependent types.

Copy link
Copy Markdown
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VermaSh VermaSh force-pushed the improve_cpu_awareness_on_zos branch 2 times, most recently from 0fbe483 to ff521e6 Compare April 17, 2026 15:53
This change enhances CPU awareness on z/OS by adding support for zIIP
processors, SMT-2 detection, and more accurate CPU capacity
calculations.

Key changes:
- Add helpers to retrieve zIIP and GCP counts separately
- Extend z/OS control block mappings (J9CSD, J9CVTSVT, J9IRARMCTZ,
  J9CCT) to access CPU configuration data
- Implement SMT-2 detection (isSMTEnabled()) with capacity adjustment
  using SMT_2_GAIN_FACTOR (1.3)
- Add retrieveZOSCPUCapacity() that scales GCP contribution by 0.1
  when zIIPs are available, reflecting Java's preference for zIIP
  execution
- Include GCP capacity only when IIPHONORPRIORITY is enabled
- Replace direct CCT field access with helper functions for CPU load
  retrieval
- Add new port library APIs omrsysinfo_get_CPU_usage_stats() and
  omrsysinfo_get_CPU_capacity()
- Add CPUUsageStats structure containing z/OS-specific metrics including
  per-process utilization, GCP/zIIP loads, capacities, and
  IIPHONORPRIORITY status
- Update calculateProcessCpuLoad() to use adjusted CPU capacity
  instead of raw CPU count
- Add TR_VerboseCPUStats verbose option for z/OS CPU statistics
  logging

Signed-off-by: Shubham Verma <shubhamv.sv@gmail.com>
@VermaSh VermaSh force-pushed the improve_cpu_awareness_on_zos branch from ff521e6 to 3171085 Compare April 17, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants