Skip to content

phd: collect core when killing non-booting guest#1079

Merged
iximeow merged 1 commit intomasterfrom
ixi/core-on-stuck-phd
Mar 12, 2026
Merged

phd: collect core when killing non-booting guest#1079
iximeow merged 1 commit intomasterfrom
ixi/core-on-stuck-phd

Conversation

@iximeow
Copy link
Member

@iximeow iximeow commented Mar 12, 2026

this may or may not prove useful in practice; if we're lucky something got funky in device emulation and we can see a stuck thread. on the other hand, if we're unlucky the guest is stuck in a loop and all we see is one vCPU was running while everything else was idle. in this case, at least, hopefully the serial console says something about the condition (it usually does, from experience)

this does the immediate thing in #1034. theoretically I've put the core in the right spot to get slurped up when we tar up the rest of the phd run artifacts so I'll rerun the phd job 'til we get a core...?

this may or may not prove useful in practice; if we're lucky something
got funky in device emulation and we can see a stuck thread. on the
other hand, if we're unlucky the guest is stuck in a loop and all we see
is one vCPU was running while everything else was idle.
@iximeow iximeow added the testing Related to testing and/or the PHD test framework. label Mar 12, 2026
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great! thank you!

);
let proc = self.server.as_ref().unwrap();
proc.core();
anyhow::bail!("timed out while waiting to boot")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we ought to stuff the core's path in this error so that it gets printed in the test failure as well as in its logs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i've got it so that the warn!("core written to {}", core_path); would be right above this in the rendered buildomat output, so it should be pretty easy to notice if you've gotta look at the logs.. it stubbornly does not want to do the thing though so i guess we'll see?

anyway i'm mostly thinking that until very recently most timed out while waiting to boot really meant that i did something funky with the guest test image or adapter. there the core wouldn't have been nearly as useful as looking at the guest's serial history, so i don't wanna nudge in a misleading direction.

@iximeow iximeow merged commit a54b7de into master Mar 12, 2026
12 checks passed
@iximeow iximeow deleted the ixi/core-on-stuck-phd branch March 12, 2026 23:16
@iximeow
Copy link
Member Author

iximeow commented Mar 12, 2026

ixi/core-on-stuck-phd-demo didn't cough up a flake after four runs. merging this so we have it whenever the next time it does happen..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Related to testing and/or the PHD test framework.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants