Skip to content

Set minimum NUM_MACHINES=6 for special.functional on AIX to prevent OOM#6990

Draft
annaibm wants to merge 1 commit intoadoptium:masterfrom
annaibm:addMinimumList_AIX
Draft

Set minimum NUM_MACHINES=6 for special.functional on AIX to prevent OOM#6990
annaibm wants to merge 1 commit intoadoptium:masterfrom
annaibm:addMinimumList_AIX

Conversation

@annaibm
Copy link
Copy Markdown
Contributor

@annaibm annaibm commented Apr 1, 2026

AIX machines (e.g., paix820) with limited RAM (8GB) run into OOM errors in resultsSummary when special.functional runs as a single unsplit job. Adding a minimum of 6 parallel lists ensures the job is always split, preventing Perl OOM failures during test result processing.

related: https://github.ibm.com/runtimes/automation/issues/921

@pshipton
Copy link
Copy Markdown
Member

pshipton commented Apr 1, 2026

Is this tested?

@annaibm
Copy link
Copy Markdown
Contributor Author

annaibm commented Apr 8, 2026

@pshipton , yes this was tested via Grinder run https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder/59332/ on ppc64_aix and the fix correctly split the job into 6 parallel lists. However testList_5 still failed with OOM — this suggests NUM_MACHINES=6 may not be sufficient and we may need to increase it to 7 or 8.
testList_5 still hit OOM with NUM_MACHINES=6. Would you suggest increasing the minimum to 7 or 8, or is there a better approach you would recommend to handle this?

@pshipton
Copy link
Copy Markdown
Member

pshipton commented Apr 8, 2026

Perhaps we should be looking at the tests which produce so much output and see about reducing it.

@pshipton
Copy link
Copy Markdown
Member

pshipton commented Apr 8, 2026

Or am I misunderstanding why the OOM occurs?

@annaibm
Copy link
Copy Markdown
Contributor Author

annaibm commented Apr 8, 2026

my understanding is the OOM occurs in resultsSum.pl which reads the entire TestTargetResult file and builds the TAP output string in memory before writing it to disk. With MBCS tests producing very verbose output, this in-memory string grows very large on AIX machines with limited RAM (8GB).
So as you suggest, reducing test output from MBCS tests would reduce the memory pressure. The NUM_MACHINES increase could help when initial test time from TRSS is not available. I will investigate which MBCS tests produce the most verbose output and see if it can be reduced.

@smlambert
Copy link
Copy Markdown
Contributor

I will investigate which MBCS tests produce the most verbose output and see if it can be reduced.

This is a very good initiative. There are several other ways to improve MBCS tests, and we should likely make a plan to address all of them. Related: #5161

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Jenkins dynamic parallelization logic to ensure special.functional jobs on AIX are split into multiple parallel lists, avoiding Perl OOM during resultsSummary processing on lower-memory AIX workers.

Changes:

  • Enforce a minimum NUM_MACHINES=6 when generating the parallel list for TARGET == "special.functional" on AIX.
  • Regenerate parallelList.mk with the higher minimum when the initial computed NUM_LIST is below 6.
  • Add a targeted log message referencing the motivating issue for traceability.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread buildenv/jenkins/JenkinsfileBase
AIX machines (e.g., paix820) with limited RAM (8GB) run into OOM errors
in resultsSummary when special.functional runs as a single unsplit job.
Adding a minimum of 6 parallel lists ensures the job is always split,
preventing Perl OOM failures during test result processing.

related: https://github.ibm.com/runtimes/automation/issues/921
Signed-off-by: Anna Babu Palathingal <anna.bp@ibm.com>
@annaibm annaibm force-pushed the addMinimumList_AIX branch from ae9ab71 to 313119a Compare April 9, 2026 14:29
@annaibm annaibm requested a review from sophia-guo April 9, 2026 21:53
@pshipton
Copy link
Copy Markdown
Member

Not sure we'll still need this change, I'm setting it to draft.

@pshipton pshipton marked this pull request as draft April 13, 2026 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants