Skip to content

Fix yum output parsing to avoid false reboot detection#336

Open
sadiksubhani9-sudo wants to merge 1 commit intoAzure:masterfrom
sadiksubhani9-sudo:fix/yum-reboot-parsing
Open

Fix yum output parsing to avoid false reboot detection#336
sadiksubhani9-sudo wants to merge 1 commit intoAzure:masterfrom
sadiksubhani9-sudo:fix/yum-reboot-parsing

Conversation

@sadiksubhani9-sudo
Copy link
Copy Markdown

The leading number (62) was incorrectly interpreted as a process ID, which
resulted in false positives during reboot-required checks and incorrect update
status reporting in Azure Update Manager.

References

  • Incident 447902558
  • ADO Bug: [AzGPS][Linux][Guest][Arc][AWS][CRI] Yum incorrectly parsing data for reboot check in AWS
  • Related Arc fix: PR 8873077 (format validation for reboot check)

Impact

  • Prevents false reboot detection caused by yum informational output
  • Improves accuracy of update status and counters reported by AUM
  • No functional change for valid yum process parsing

Copilot AI review requested due to automatic review settings March 2, 2026 18:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts Yum reboot-required detection by preventing yum ps informational output (e.g., “<N> packages excluded …”) from being misclassified as a process row, which previously caused false reboot-required reporting and inaccurate Azure Update Manager status.

Changes:

  • Update yum ps output parsing to skip lines containing the “packages excluded” message.
  • Reduce false positives in yum-based “processes require restart” detection used by reboot checks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


process_details = re.split(r'\s+', line.strip())
if len(process_details) < 7:
if len(process_details) < 7 or line.find("packages excluded") >= 0:
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser still accepts any line with >=7 whitespace-separated tokens whose first token is an int as a “process” line. The new special-case filter for "packages excluded" fixes one known message, but the overall parsing remains overly permissive and can misclassify other numeric-prefixed informational lines. Consider switching to a stricter format validation for yum-ps rows (e.g., regex/column validation for CPU/RSS/State/uptime) so future yum output variations don’t reintroduce false reboot detection.

Copilot uses AI. Check for mistakes.

process_details = re.split(r'\s+', line.strip())
if len(process_details) < 7:
if len(process_details) < 7 or line.find("packages excluded") >= 0:
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add/adjust unit coverage to lock in this behavior: extend the YumPackageManager reboot/process parsing tests (and the LegacyEnvLayerExtensions "sudo yum ps" fixture) with a sample line like " packages excluded …" appearing in the yum-ps output, and assert that it does not increment process_count / trigger reboot pending.

Copilot uses AI. Check for mistakes.

process_details = re.split(r'\s+', line.strip())
if len(process_details) < 7:
if len(process_details) < 7 or line.find("packages excluded") >= 0:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please share an example of why this change is needed. In the sense, what was the current behavior seen on a VM, with the help of logs and how does this change correct it, again with the help of in VM logs

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before:
The customer saw wrong update status and higher machine counts because some AWS Linux machines printed yum messages that were misread as reboot‑required. This caused machines to show Pending reboot and made the counters add up to more than the actual number of servers.
After:
Those yum informational lines are now ignored and no longer treated as reboot signals.
This fixes the false Pending reboot status and brings the machine counters back to the correct total.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking for logs from a VM for the before and after scenarios. Create a VM, reproduce the before scenario which show the false reboot detection and then another set of logs that show this issue fixed with the help of your code change

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before (VM logs):
The yum output contained informational lines such as
62 packages excluded due to repository priority protections
which were parsed as process entries. In the extension logs this caused the reboot‑required process count to increment, resulting in the VM being marked as Pending reboot even though no reboot was actually needed.

After (VM logs):
With the change, the same line is logged as
[YPM] > Inapplicable line: 62 packages excluded due to repository priority protections
and is skipped during parsing. The reboot‑required process count is no longer incremented, and the VM correctly reports No pending reboot.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your response is just a statement on what the before and after should look like, this is not however an actual log fetched from a VM. Repeating my ask from before:
image

@sadiksubhani9-sudo, you need to create a VM, run this code in that VM, fetch logs from the VM that demonstrates the before scenario. Then run your changed code in the VM, fetch logs that demonstrate the after scenario

Copy link
Copy Markdown
Author

@sadiksubhani9-sudo sadiksubhani9-sudo Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which VM was this run on? Which patch operation was invoked? Please attach the complete set of extension logs from this run:

image https://github.com/Azure/LinuxPatchExtension?tab=readme-ov-file#3-troubleshooting

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rane-rajasi sending a reminder for the PR review...

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjohn-msft, Please take a look when you get a chance. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants