vcpu: Handle Intel VMX limitation for intr_shadow import#1014
vcpu: Handle Intel VMX limitation for intr_shadow import#1014glitzflitz wants to merge 1 commit intooxidecomputer:masterfrom
Conversation
Intel VMX rejects setting VM_REG_GUEST_INTR_SHADOW to non-zero values while AMD SVM accepts any value. As per [1], the interrupt shadow is transient state, cleared after one instruction. Ignore EINVAL when attempting to set intr_shadow=true on Intel, as the state loss is benign. In the worst case an interrupt arrives one instruction early. This fixes oxidecomputer#1013 [1]: Intel SDM Volume 3C, Section 26.6.1 and 27.7.1 around "Interruptibility State" https://cdrdv2-public.intel.com/825750/326019-sdm-vol-3c.pdf Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
iximeow
left a comment
There was a problem hiding this comment.
I'm really curious if you did anything interesting to get a snapshot with a guest in an interrupt shadow. I imagine you didn't, and you got here because we exported a vCPU in a linux native_safe_halt() or __sti_mwait() where the instruction in the shadow was either a hlt or mwait?
I think we shouldn't handle the circumstance like this.. an interrupt arriving one instruction early in a pop ss; mov esp, ... sequence would be disastrous if a guest expected that shadow to protect restoring a different context's ss:esp, and (not that I've seen software like this) an sti; wrmsr could be quite surprised if we could suddenly drop an interrupt between those two.
it seems like the most correct thing to do here is emulate through the one-instruction shadow before restoring the vCPU on Intel, so that we can always resume it with these interrupts disabled. if we're actually going to resume with the vCPU at hlt/mwait then I wonder if we should just step past those as if the cpu woke up anyway. I don't know if hlt or mwait admit spurious wakeups, so maybe that's not correct! alternatively, if we knew the vCPU was stopped at a halt-like, we might not want to actually set the guest CPU running on resume anyway. I haven't looked at the CPU migration bits in Propolis super closely but it feels like we should figure out a good answer for these cases before just ignoring the EINVAL.
looking at qemu/Linux, it seems to me like they'd just try setting the bits and on Intel get a one-instruction-early interrupt (oof), so clearly this isn't going to be Obviously Broken Everywhere, but it does feel like asking for surprises (in a bad way) :D
(if you want to leave this open, iterate on it, close it for the time being, any of the above are fine. this clearly is an issue that we should fix in Propolis or illumos or both in some way or another)
| // Intel VMX rejects setting intr_shadow to non-zero (AMD is fine). | ||
| // This state is transient anyway, cleared after one instruction. |
There was a problem hiding this comment.
this is overly-strong, right? from the SDM:
If the VM entry is injecting, there is no blocking by STI or by MOV SS following the VM entry, regardless of the contents of the interruptability-state field.
my read is that VMX would allow the bits to be set, but they would not be heeded, hence the conservative approach of EINVAL in bhyve since the VMM will probably be (I would be for sure) surprised about the behavior in this case.
Intel VMX rejects setting
VM_REG_GUEST_INTR_SHADOWto non-zero values while AMD SVM accepts any value. As per [1], the interrupt shadow is transient state, cleared after one instruction.Ignore
EINVALwhen attempting to setintr_shadow=trueon Intel, as the state loss is benign. In the worst case an interrupt arrives one instruction early. This fixes #1013[1]: Intel SDM Volume 3C, Section 26.6.1 and 27.7.1 around
"Interruptibility State"
https://cdrdv2-public.intel.com/825750/326019-sdm-vol-3c.pdf