Fix yum output parsing to avoid false reboot detection#336
Fix yum output parsing to avoid false reboot detection#336sadiksubhani9-sudo wants to merge 1 commit intoAzure:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adjusts Yum reboot-required detection by preventing yum ps informational output (e.g., “<N> packages excluded …”) from being misclassified as a process row, which previously caused false reboot-required reporting and inaccurate Azure Update Manager status.
Changes:
- Update
yum psoutput parsing to skip lines containing the “packages excluded” message. - Reduce false positives in yum-based “processes require restart” detection used by reboot checks.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| process_details = re.split(r'\s+', line.strip()) | ||
| if len(process_details) < 7: | ||
| if len(process_details) < 7 or line.find("packages excluded") >= 0: |
There was a problem hiding this comment.
The parser still accepts any line with >=7 whitespace-separated tokens whose first token is an int as a “process” line. The new special-case filter for "packages excluded" fixes one known message, but the overall parsing remains overly permissive and can misclassify other numeric-prefixed informational lines. Consider switching to a stricter format validation for yum-ps rows (e.g., regex/column validation for CPU/RSS/State/uptime) so future yum output variations don’t reintroduce false reboot detection.
|
|
||
| process_details = re.split(r'\s+', line.strip()) | ||
| if len(process_details) < 7: | ||
| if len(process_details) < 7 or line.find("packages excluded") >= 0: |
There was a problem hiding this comment.
Add/adjust unit coverage to lock in this behavior: extend the YumPackageManager reboot/process parsing tests (and the LegacyEnvLayerExtensions "sudo yum ps" fixture) with a sample line like " packages excluded …" appearing in the yum-ps output, and assert that it does not increment process_count / trigger reboot pending.
|
|
||
| process_details = re.split(r'\s+', line.strip()) | ||
| if len(process_details) < 7: | ||
| if len(process_details) < 7 or line.find("packages excluded") >= 0: |
There was a problem hiding this comment.
Please share an example of why this change is needed. In the sense, what was the current behavior seen on a VM, with the help of logs and how does this change correct it, again with the help of in VM logs
There was a problem hiding this comment.
Before:
The customer saw wrong update status and higher machine counts because some AWS Linux machines printed yum messages that were misread as reboot‑required. This caused machines to show Pending reboot and made the counters add up to more than the actual number of servers.
After:
Those yum informational lines are now ignored and no longer treated as reboot signals.
This fixes the false Pending reboot status and brings the machine counters back to the correct total.
There was a problem hiding this comment.
I'm asking for logs from a VM for the before and after scenarios. Create a VM, reproduce the before scenario which show the false reboot detection and then another set of logs that show this issue fixed with the help of your code change
There was a problem hiding this comment.
Before (VM logs):
The yum output contained informational lines such as
62 packages excluded due to repository priority protections
which were parsed as process entries. In the extension logs this caused the reboot‑required process count to increment, resulting in the VM being marked as Pending reboot even though no reboot was actually needed.
After (VM logs):
With the change, the same line is logged as
[YPM] > Inapplicable line: 62 packages excluded due to repository priority protections
and is skipped during parsing. The reboot‑required process count is no longer incremented, and the VM correctly reports No pending reboot.
There was a problem hiding this comment.
Your response is just a statement on what the before and after should look like, this is not however an actual log fetched from a VM. Repeating my ask from before:

@sadiksubhani9-sudo, you need to create a VM, run this code in that VM, fetch logs from the VM that demonstrates the before scenario. Then run your changed code in the VM, fetch logs that demonstrate the after scenario
There was a problem hiding this comment.
There was a problem hiding this comment.
There was a problem hiding this comment.
@rane-rajasi sending a reminder for the PR review...
There was a problem hiding this comment.
@kjohn-msft, Please take a look when you get a chance. Thanks!

The leading number (
62) was incorrectly interpreted as a process ID, whichresulted in false positives during reboot-required checks and incorrect update
status reporting in Azure Update Manager.
References
Impact