Skip to content

vcpu: Handle Intel VMX limitation for intr_shadow import#1014

Open
glitzflitz wants to merge 1 commit intooxidecomputer:masterfrom
glitzflitz:snapshot
Open

vcpu: Handle Intel VMX limitation for intr_shadow import#1014
glitzflitz wants to merge 1 commit intooxidecomputer:masterfrom
glitzflitz:snapshot

Conversation

@glitzflitz
Copy link

Intel VMX rejects setting VM_REG_GUEST_INTR_SHADOW to non-zero values while AMD SVM accepts any value. As per [1], the interrupt shadow is transient state, cleared after one instruction.

Ignore EINVAL when attempting to set intr_shadow=true on Intel, as the state loss is benign. In the worst case an interrupt arrives one instruction early. This fixes #1013

[1]: Intel SDM Volume 3C, Section 26.6.1 and 27.7.1 around
"Interruptibility State"
https://cdrdv2-public.intel.com/825750/326019-sdm-vol-3c.pdf

Intel VMX rejects setting VM_REG_GUEST_INTR_SHADOW to non-zero values
while AMD SVM accepts any value. As per [1], the interrupt shadow is
transient state, cleared after one instruction.

Ignore EINVAL when attempting to set intr_shadow=true on Intel, as the
state loss is benign. In the worst case an interrupt arrives one
instruction early. This fixes oxidecomputer#1013

[1]: Intel SDM Volume 3C, Section 26.6.1 and 27.7.1 around
     "Interruptibility State"
     https://cdrdv2-public.intel.com/825750/326019-sdm-vol-3c.pdf

Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
@glitzflitz glitzflitz marked this pull request as ready for review January 9, 2026 21:39
Copy link
Member

@iximeow iximeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really curious if you did anything interesting to get a snapshot with a guest in an interrupt shadow. I imagine you didn't, and you got here because we exported a vCPU in a linux native_safe_halt() or __sti_mwait() where the instruction in the shadow was either a hlt or mwait?

I think we shouldn't handle the circumstance like this.. an interrupt arriving one instruction early in a pop ss; mov esp, ... sequence would be disastrous if a guest expected that shadow to protect restoring a different context's ss:esp, and (not that I've seen software like this) an sti; wrmsr could be quite surprised if we could suddenly drop an interrupt between those two.

it seems like the most correct thing to do here is emulate through the one-instruction shadow before restoring the vCPU on Intel, so that we can always resume it with these interrupts disabled. if we're actually going to resume with the vCPU at hlt/mwait then I wonder if we should just step past those as if the cpu woke up anyway. I don't know if hlt or mwait admit spurious wakeups, so maybe that's not correct! alternatively, if we knew the vCPU was stopped at a halt-like, we might not want to actually set the guest CPU running on resume anyway. I haven't looked at the CPU migration bits in Propolis super closely but it feels like we should figure out a good answer for these cases before just ignoring the EINVAL.

looking at qemu/Linux, it seems to me like they'd just try setting the bits and on Intel get a one-instruction-early interrupt (oof), so clearly this isn't going to be Obviously Broken Everywhere, but it does feel like asking for surprises (in a bad way) :D

(if you want to leave this open, iterate on it, close it for the time being, any of the above are fine. this clearly is an issue that we should fix in Propolis or illumos or both in some way or another)

Comment on lines +975 to +976
// Intel VMX rejects setting intr_shadow to non-zero (AMD is fine).
// This state is transient anyway, cleared after one instruction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is overly-strong, right? from the SDM:

If the VM entry is injecting, there is no blocking by STI or by MOV SS following the VM entry, regardless of the contents of the interruptability-state field.

my read is that VMX would allow the bits to be set, but they would not be heeded, hence the conservative approach of EINVAL in bhyve since the VMM will probably be (I would be for sure) surprised about the behavior in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Snapshot restore fails on Intel hosts when intr_shadow is set

2 participants