MSR aware CPU profiles#103
MSR aware CPU profiles#103olivereanderson wants to merge 28 commits intocyberus-technology:gardenlinuxfrom
Conversation
|
I will fix the failing pipelines this morning, but you can probably still review this now. |
8c2e865 to
1a15adf
Compare
phip1611
left a comment
There was a problem hiding this comment.
Awesome PR! I only have smaller remarks. Generally, this work is awesome (given my little domain knowledge about CPU profiles).
I didn't review some selected changes in depth (especially the logging), as they are quite mechanical and seem to fulfill their job.
Looking forward to get this merged!
hypervisor/src/hypervisor.rs
Outdated
| /// | ||
| #[cfg(target_arch = "x86_64")] | ||
| fn get_msr_index_list(&self) -> Result<Vec<u32>>; | ||
|
|
There was a problem hiding this comment.
nit: this newline could be moved into a prior commit I think
Perhaps git rebase -i HEAD~23 --exec "cargo +nightly fmt --all" does the trick already
There was a problem hiding this comment.
I think I will just leave this for now, but take proper care with such things when we start upstreaming.
There was a problem hiding this comment.
I'd kindly ask you to give it a try:
- the suggested command/workflow is very easy
- If the next person bumps our patchset to v52, we'll run
cargo check && cargo +nightly fmt --all && cargo nextest run && cargo clippyfor every single commit anyway. Would be cool if the person then doesn't has to clean up such stuff on top of potential rebase conflicts :)
There was a problem hiding this comment.
nit: this newline could be moved into a prior commit I think
Perhaps
git rebase -i HEAD~23 --exec "cargo +nightly fmt --all"does the trick already
That didn't work, but I should have fixed it now regardless.
There was a problem hiding this comment.
That didn't work, but I should have fixed it now regardless.
Ah, sorry for the confusion. The command doesn't take care of everything automatically. It will stop at each commit where the checks fail, and then you need to:
- resolve the conflict
- run
git commit -a --amend - run
git rebase --continue
Apologies - I assumed this workflow was already familiar
Thanks for looking into this!
There was a problem hiding this comment.
IIRC the command did nothing at all in my case. Don't recall the exact details now though.
| .flush() | ||
| .context("CPU profile generation failed: Unable to flush cpuid profile data license file") | ||
| .with_context(|| { | ||
| format!("CPU profile generation failed: Unable to write to {data_type} profile data license file") |
There was a problem hiding this comment.
nit: couldn't you also use anyhow!() here?
There was a problem hiding this comment.
I don't think so, because I want to keep the original error from flush in the chain of errors.
There was a problem hiding this comment.
I meant anyhow!() instead of format!(). No need to change something here
There was a problem hiding this comment.
anyhow! evaluates to anyhow::Error which implements Display so that would work, but I don't see what value that brings over using format! here.
Maybe I am not understanding what you have in mind?
546fb2f to
86bbb57
Compare
819513e to
ee9d05f
Compare
Coffeeri
left a comment
There was a problem hiding this comment.
Kudos to you for going so deep into MSRs. Fantastic work!
However profile_msr_based_features are applied during vCPU setup, but they are not merged into the persistent MSR list used to build CpuState.msrs for snapshot/save. So if I am not mistaken profile-adjusted MSRs can be lost across snapshot/restore or migration?
ee9d05f to
8d14dab
Compare
arch/src/x86_64/msr_definitions/intel/non_architectural_msrs.rs
Outdated
Show resolved
Hide resolved
8d14dab to
9f34c63
Compare
phip1611
left a comment
There was a problem hiding this comment.
Thanks for addressing my feedback! LGTM now! Only open remark I have is
18d7c8f to
fd74845
Compare
|
Unfortunately, we somehow broke nested-vmx: |
e89e12d to
1fb9a45
Compare
Machine Check Architecture (MCA) By setting the profile policy to Static(0) for the MCA bit we indicate to guests that the MCG_CAP MSR and other machine check related MSRS are not available. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We change the CPUID policy for WAITPKG because we encountered problems with it when testing CPU profiles with MSRs. This is also off by default for CPU models in QEMU, but we may still potentially want to revisit this decision in the future. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
MD_CLEAR (bit 10), STIBP (bit 27) and L1D_FLUSH (bit 28) all advertise certain processor features. The passthrough policy is usually not appropriate in that case, because one no longer has a guarantee that if the CPU profile can be applied on both the source and destination then the live migration CPUID compatibility checks must succeed. We fix this issue by instead utilizing the Inherit policy for these bits. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
Protection keys are not supported for CPU profiles and thus disabled Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
In preparation for making CPU profiles MSR aware we prepare for having two pieces of data associated with a CPU profile: CPUID and MSR adjustments. We thus rename the pre-existing CpuProfileData struct to CpuIdProfileData and adapt the CPU profile generation tool accordingly. We also make the CPU profile generation tool write directly to file and automatically introduce the required license file as well. This makes the profile generation process more convenient. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
Introduce convenience methods for looking up CPUID value definitions. We will later use these methods to assert certain policies at compile time in order to stay consistent with MSR policies we introduce. Due to the current limitations of const generics we unfortunately need to duplicate a little bit code. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
In order to generate CPU profiles we also need definitions and policies for MSR-based features, as some CPU features are exposed through MSRs rather than CPUID. This commit introduces the MSR analogues of the data structures we previously introduced for CPUID definitions. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We introduce MSR-based feature definitions for Intel CPUs that will be utilized by the upcoming CPU profile generation tool. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We apply changes suggested in the PR review to the IA32_ARCH_CAPABILITIES MSR policies. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
This is to be consistent with recent changes to CPUID policies Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
While KVM already has compatibility checks for most MSR-based features which run when these MSRs are set by userspace, we do not get very much useful information about exactly what the problem is when any of these checks fail. Hence to be on the safe side and also to ensure good UX for users running into errors when trying to apply a CPU profile we introduce our own compatibility checks for Intel CPUs that log at the error and debug levels. The error logs aim to provide the minimal amount of information required to investigate the problem further, while the debug logs provide (much) more convenience when debugging. We will incorporate these checks in the context of CPU profiles in a later commit. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
Due to the changes to IA32_ARCH_CAPABILITIES applied after the last code review we introduce stricter checks. Signed-off-by: Oliver Anderson On-behalf-of: SAP oliver.anderson@sap.com
This list will be used to help us detect unknown MSRs when generating CPU profiles. It serves no other purpose beyond that. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
TODO: Squash into previous commit if this all works as expected Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We include a list of non-architectural MSRS. This list will only be used to help the CPU profile generation tool rule out MSRs that it does not know how to handle. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We include a list of MSRS defined by KVM that may be approved by CPU profiles and another list of those that may not be approved by CPU profiles. These lists will later be used by the CPU profile generation tool. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
The list of HyperV MSRs introduced here will be utilized during CPU profile generation and also at runtime to filter them out whenever `kvm_hyperv` is set to `false`. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We introduce functionality related to computing necessary MSR updates in accordance with the given CPU profile. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We introduce functionality to filter out MSRs which we want to deny guests from using. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We record the necessary MSR-based feature modifications that need to be set in the `CpuManager` and make sure to set these MSR values upon vCPU configuration. We also use the Vm to filter access to MSRs that are incompatible with the chosen CPU profile. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We adapt the CPU profile generation tool to also take the MSR-based features into account. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
1fb9a45 to
21e6c4e
Compare
We regenerate the CPU profiles and include the MSR-related data. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
21e6c4e to
331f5de
Compare
This PR adds functionality to modify and restrict MSRs based on CPU profiles.
Hints for reviewers: Please read the following description below in its entirety and then review this commit for commit.
Description
This PR continues the work merged in #62. If you do not recall the details from that PR I suggest reading the PR description again before continuing.
MSR-based features
Most CPU features can indeed be probed through CPUID, but there are a few that instead require MSR inspection.
We follow KVM and name these MSR-based features (See KVM_GET_MSR_FEATURE_INDEX_LIST).
Up to this point in time a similar Time-of-check-to-time-of-use bug as described in the context of CPUID in #62 can occur in the context of MSRs. Indeed suppose for example that a guest checks
IA32_VMX_VMCS_ENUMto find the highest index value used for any VMCS encoding and then a live migration to a machine with a lower value occurs.Even though KVM will (presumably) catch this potential error when attempting to set MSRs on the migration destination, it still means that live migration can never take place between these machines.
Hence in order to force a higher level of compatibility we take advantage of the fact that KVM can lie to guests about MSRs values in an analogous manner to the CPUID case.
We thus extend the CPU profile generation tool to also record necessary adjustments to the MSR-based feature values, using profile policies similar to those we previously introduced in the CPUID context.
MSR filters
At this point in time there are less than 40 MSR-based features that KVM will return when calling
KVM_GET_MSR_FEATURE_INDEX_LISTon any given Intel CPU, but this is only a small fraction of the number of MSRs that are actually supported by the hardware and/or hypervisor.While many MSRs are described as only being accessible if certain CPUID bits/values are present, there are some that are available on all CPUs of introduced after a certain generation (without CPUID requirements), and even some MSRs that are only available on specific CPUs. The latter are referred to as non-architectural msrs. MSRs whose values (although possibly different) have the same definitions across processor generations are called architectural MSRS (prefixed with
IA32).To combat the problem of MSRs that may only be available on the source VM, but not on the destination VM,
we take advantage of KVM's
KVM_X86_SET_MSR_FILTERwhich enables us to deny guests from accessing entire ranges of MSRs.More precisely we record a subset of the architectural MSRs supported by the hardware and hypervisor into the CPU profile and set up a filter to deny any MSRs not listed there when the CPU profile is applied.
Note that it is possible to perform even more fine grained MSR filtering (only denying individual bits) with
KVM_CAP_X86_USER_SPACE_MSR, but we don't think that is necessary in our case.CPUID adjustments
Some of the commits here are related to CPUID profile policy adjustments because they turned out to be problematic when testing the CPU profiles after adding MSR adjustments.
Making the CPU profile generation tool future proof
The whole point of the CPU profile generation tool is to automate the process of creating CPU profiles for new CPUs. Since new CPU generations (or KVM versions) may introduce new architectural MSRs (possibly even MSR-based features) that we are not yet aware of, we make the CPU profile tool emit warnings whenever it encounters MSRs that it is not aware of.
In this way we get notified when it might be time to update the MSR definitions used by the CPU profile tool. In order to achieve this we have introduced somewhat long lists of MSRs that we know about at this point in time. They have no other purpose other than helping us keep the CPU profile generation tool up to date!
Note that bits within already existing MSRs might of course go from being reserved to defined in the future and we do not know of a good way to automatically detect that (especially since the reserved value is not necessarily 0 in every case).
Helping out with testing this
Regardless of your experience with working with MSRs you can still help out by starting a VM using any of the custom CPU profiles defined here and running the
msrprogram. If any MSR covered byarch::x86_64::msr_definitions::intel::FORBIDDEN_IA32_MSR_RANGESorarch::x86_64::msr_definitions::NON_ARCHITECTURAL_INTEL_MSRS, then that means that there are some MSRsthat are not reported by
KVM_GET_MSR_INDICES.Please document all such findings and ideally also check if this is also the case with QEMU CPU models.
Follow up work