Skip to content

MSR aware CPU profiles#103

Open
olivereanderson wants to merge 28 commits intocyberus-technology:gardenlinuxfrom
olivereanderson:cpu-profiles-msr-v51
Open

MSR aware CPU profiles#103
olivereanderson wants to merge 28 commits intocyberus-technology:gardenlinuxfrom
olivereanderson:cpu-profiles-msr-v51

Conversation

@olivereanderson
Copy link

@olivereanderson olivereanderson commented Mar 6, 2026

This PR adds functionality to modify and restrict MSRs based on CPU profiles.

Hints for reviewers: Please read the following description below in its entirety and then review this commit for commit.

Description

This PR continues the work merged in #62. If you do not recall the details from that PR I suggest reading the PR description again before continuing.

MSR-based features

Most CPU features can indeed be probed through CPUID, but there are a few that instead require MSR inspection.
We follow KVM and name these MSR-based features (See KVM_GET_MSR_FEATURE_INDEX_LIST).

Up to this point in time a similar Time-of-check-to-time-of-use bug as described in the context of CPUID in #62 can occur in the context of MSRs. Indeed suppose for example that a guest checks IA32_VMX_VMCS_ENUM to find the highest index value used for any VMCS encoding and then a live migration to a machine with a lower value occurs.
Even though KVM will (presumably) catch this potential error when attempting to set MSRs on the migration destination, it still means that live migration can never take place between these machines.

Hence in order to force a higher level of compatibility we take advantage of the fact that KVM can lie to guests about MSRs values in an analogous manner to the CPUID case.

We thus extend the CPU profile generation tool to also record necessary adjustments to the MSR-based feature values, using profile policies similar to those we previously introduced in the CPUID context.

MSR filters

At this point in time there are less than 40 MSR-based features that KVM will return when calling KVM_GET_MSR_FEATURE_INDEX_LIST on any given Intel CPU, but this is only a small fraction of the number of MSRs that are actually supported by the hardware and/or hypervisor.

While many MSRs are described as only being accessible if certain CPUID bits/values are present, there are some that are available on all CPUs of introduced after a certain generation (without CPUID requirements), and even some MSRs that are only available on specific CPUs. The latter are referred to as non-architectural msrs. MSRs whose values (although possibly different) have the same definitions across processor generations are called architectural MSRS (prefixed with IA32).

To combat the problem of MSRs that may only be available on the source VM, but not on the destination VM,
we take advantage of KVM's KVM_X86_SET_MSR_FILTER which enables us to deny guests from accessing entire ranges of MSRs.

More precisely we record a subset of the architectural MSRs supported by the hardware and hypervisor into the CPU profile and set up a filter to deny any MSRs not listed there when the CPU profile is applied.

Note that it is possible to perform even more fine grained MSR filtering (only denying individual bits) with KVM_CAP_X86_USER_SPACE_MSR, but we don't think that is necessary in our case.

CPUID adjustments

Some of the commits here are related to CPUID profile policy adjustments because they turned out to be problematic when testing the CPU profiles after adding MSR adjustments.

Making the CPU profile generation tool future proof

The whole point of the CPU profile generation tool is to automate the process of creating CPU profiles for new CPUs. Since new CPU generations (or KVM versions) may introduce new architectural MSRs (possibly even MSR-based features) that we are not yet aware of, we make the CPU profile tool emit warnings whenever it encounters MSRs that it is not aware of.

In this way we get notified when it might be time to update the MSR definitions used by the CPU profile tool. In order to achieve this we have introduced somewhat long lists of MSRs that we know about at this point in time. They have no other purpose other than helping us keep the CPU profile generation tool up to date!

Note that bits within already existing MSRs might of course go from being reserved to defined in the future and we do not know of a good way to automatically detect that (especially since the reserved value is not necessarily 0 in every case).

Helping out with testing this

Regardless of your experience with working with MSRs you can still help out by starting a VM using any of the custom CPU profiles defined here and running the msr program. If any MSR covered by arch::x86_64::msr_definitions::intel::FORBIDDEN_IA32_MSR_RANGES or arch::x86_64::msr_definitions::NON_ARCHITECTURAL_INTEL_MSRS, then that means that there are some MSRs
that are not reported by KVM_GET_MSR_INDICES.

Please document all such findings and ideally also check if this is also the case with QEMU CPU models.

Follow up work

  1. Would be nice to have a few more compile time consistency assertions: The MSRs we explicitly forbid that one should check for via CPUID and or IA32_ARCH_CAPABILITIES must have consistent policies.
  2. NixOS integration test that checks the MSR-baseed features we explicitly set.
  3. NixOS integration test that checks that certain MSRs are indeed unreachable.
  4. Take CHV's default changes to CPUID into account when computing the required MSR updates (only relevant for leaves relating to KVM and Hyper-V at this point in time).

@olivereanderson olivereanderson changed the title Cpu profiles msr v51 MSR aware CPU profiles Mar 6, 2026
@olivereanderson olivereanderson self-assigned this Mar 6, 2026
@olivereanderson
Copy link
Author

I will fix the failing pipelines this morning, but you can probably still review this now.

@olivereanderson olivereanderson force-pushed the cpu-profiles-msr-v51 branch 2 times, most recently from 8c2e865 to 1a15adf Compare March 9, 2026 08:50
Copy link
Member

@phip1611 phip1611 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR! I only have smaller remarks. Generally, this work is awesome (given my little domain knowledge about CPU profiles).

I didn't review some selected changes in depth (especially the logging), as they are quite mechanical and seem to fulfill their job.

Looking forward to get this merged!

///
#[cfg(target_arch = "x86_64")]
fn get_msr_index_list(&self) -> Result<Vec<u32>>;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this newline could be moved into a prior commit I think

Perhaps git rebase -i HEAD~23 --exec "cargo +nightly fmt --all" does the trick already

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will just leave this for now, but take proper care with such things when we start upstreaming.

Copy link
Member

@phip1611 phip1611 Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd kindly ask you to give it a try:

  • the suggested command/workflow is very easy
  • If the next person bumps our patchset to v52, we'll run cargo check && cargo +nightly fmt --all && cargo nextest run && cargo clippy for every single commit anyway. Would be cool if the person then doesn't has to clean up such stuff on top of potential rebase conflicts :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this newline could be moved into a prior commit I think

Perhaps git rebase -i HEAD~23 --exec "cargo +nightly fmt --all" does the trick already

That didn't work, but I should have fixed it now regardless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That didn't work, but I should have fixed it now regardless.

Ah, sorry for the confusion. The command doesn't take care of everything automatically. It will stop at each commit where the checks fail, and then you need to:

  • resolve the conflict
  • run git commit -a --amend
  • run git rebase --continue

Apologies - I assumed this workflow was already familiar

Thanks for looking into this!

Copy link
Author

@olivereanderson olivereanderson Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the command did nothing at all in my case. Don't recall the exact details now though.

.flush()
.context("CPU profile generation failed: Unable to flush cpuid profile data license file")
.with_context(|| {
format!("CPU profile generation failed: Unable to write to {data_type} profile data license file")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: couldn't you also use anyhow!() here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, because I want to keep the original error from flush in the chain of errors.

Copy link
Member

@phip1611 phip1611 Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant anyhow!() instead of format!(). No need to change something here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyhow! evaluates to anyhow::Error which implements Display so that would work, but I don't see what value that brings over using format! here.

Maybe I am not understanding what you have in mind?

@olivereanderson olivereanderson force-pushed the cpu-profiles-msr-v51 branch 4 times, most recently from 546fb2f to 86bbb57 Compare March 9, 2026 11:18
@olivereanderson olivereanderson force-pushed the cpu-profiles-msr-v51 branch 3 times, most recently from 819513e to ee9d05f Compare March 10, 2026 12:04
Copy link

@Coffeeri Coffeeri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kudos to you for going so deep into MSRs. Fantastic work!
However profile_msr_based_features are applied during vCPU setup, but they are not merged into the persistent MSR list used to build CpuState.msrs for snapshot/save. So if I am not mistaken profile-adjusted MSRs can be lost across snapshot/restore or migration?

Copy link
Member

@phip1611 phip1611 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my feedback! LGTM now! Only open remark I have is

#103 (comment)

@olivereanderson olivereanderson force-pushed the cpu-profiles-msr-v51 branch 3 times, most recently from 18d7c8f to fd74845 Compare March 12, 2026 15:20
@tpressure
Copy link

Unfortunately, we somehow broke nested-vmx:

[    5.053548] ------------[ cut here ]------------
[    5.053767] VMXON faulted, MSR_IA32_FEAT_CTL (0x3a) = 0x5
[    5.054005] WARNING: CPU: 0 PID: 22 at arch/x86/kvm/vmx/vmx.c:2846 vmx_enable_virtualization_cpu+0x135/0x150 [kvm_intel]
[    5.054434] Modules linked in: kvm_intel(+) evdev isofs cdrom kvm irqbypass button squashfs drm sch_fq_codel fuse loop backlight i2c_core configfs nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci ip_tables ext4 crc16 mbcache jbd2 dm_verity dm_bufio dm_mod sha256_ssse3 sha1_ssse3 aesni_intel gf128mul virtio_net libaes crypto_simd net_failover virtio_blk cryptd failover btrfs blake2b_generic xor lzo_compress zstd_compress raid6_pq libcrc32c crc32c_generic crc32c_intel qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod scsi_common br_netfilter bridge stp llc overlay dmi_sysfs qemu_fw_cfg
[    5.056673] CPU: 0 UID: 0 PID: 22 Comm: cpuhp/0 Not tainted 6.12.74-flatcar #1
[    5.056958] Hardware name: Cloud Hypervisor cloud-hypervisor, BIOS 0
[    5.057210] RIP: 0010:vmx_enable_virtualization_cpu+0x135/0x150 [kvm_intel]
[    5.057484] Code: 04 bf 3a 00 00 00 e8 7a 14 35 c3 90 8b 54 24 04 48 c7 c7 68 f3 db c0 85 d2 ba ef be ad de 48 89 d6 48 0f 44 f0 e8 1b 45 37 c3 <0f> 0b eb b2 48 8b 15 a0 28 b0 c5 e9 0f ff ff ff e8 e6 02 d8 c3 66
[    5.058191] RSP: 0018:ff4170db400c7de8 EFLAGS: 00010282
[    5.058397] RAX: 0000000000000000 RBX: 0000000000000206 RCX: 0000000000000000
[    5.058676] RDX: ff258f0c2b22b040 RSI: ff258f0c2b21da40 RDI: ff258f0c2b21da40
[    5.058956] RBP: 0000000000000000 R08: 0000000000000000 R09: ff4170db400c7c78
[    5.059234] R10: ffffffff86918a08 R11: 0000000000000003 R12: 0000000000000006
[    5.059511] R13: ffffffffc0bf2fa0 R14: 0000000000000000 R15: ff258f0c2b21d548
[    5.059788] FS:  0000000000000000(0000) GS:ff258f0c2b200000(0000) knlGS:0000000000000000
[    5.060105] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.060332] CR2: 00007f3fba8c7bb8 CR3: 00000000086fe003 CR4: 0000000000773ef0
[    5.060623] PKRU: 55555554
[    5.060737] Call Trace:
[    5.060841]  <TASK>
[    5.060936]  kvm_arch_enable_virtualization_cpu+0xb1/0x270 [kvm]
[    5.061222]  ? __pfx_kvm_online_cpu+0x10/0x10 [kvm]
[    5.061435]  kvm_online_cpu+0x1f/0x40 [kvm]
[    5.061622]  cpuhp_invoke_callback+0x11f/0x420
[    5.061804]  ? __pfx_smpboot_thread_fn+0x10/0x10
[    5.061993]  cpuhp_thread_fun+0xa2/0x170
[    5.062160]  smpboot_thread_fn+0xda/0x1d0
[    5.062322]  kthread+0xcf/0x100
[    5.062454]  ? __pfx_kthread+0x10/0x10
[    5.062606]  ret_from_fork+0x31/0x50
[    5.062754]  ? __pfx_kthread+0x10/0x10
[    5.062913]  ret_from_fork_asm+0x1a/0x30
[    5.063074]  </TASK>
[    5.063168] ---[ end trace 0000000000000000 ]---

@olivereanderson olivereanderson force-pushed the cpu-profiles-msr-v51 branch 2 times, most recently from e89e12d to 1fb9a45 Compare March 13, 2026 14:09
Machine Check Architecture (MCA)  By setting the profile policy to
Static(0) for the MCA bit we indicate to guests that the MCG_CAP MSR
and other machine check related MSRS are not available.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
We change the CPUID policy for WAITPKG because we encountered problems
with it when testing CPU profiles with MSRs.

This is also off by default for CPU models in QEMU, but we may still
potentially want to revisit this decision in the future.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
MD_CLEAR (bit 10), STIBP (bit 27) and L1D_FLUSH (bit 28) all advertise
certain processor features. The passthrough policy is usually not
appropriate in that case, because one no longer has a guarantee that
if the CPU profile can be applied on both the source and destination
then the live migration CPUID compatibility checks must succeed.

We fix this issue by instead utilizing the Inherit policy for these
bits.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
Protection keys are not supported for CPU profiles and thus disabled

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
In preparation for making CPU profiles MSR aware we prepare for having
two pieces of data associated with a CPU profile: CPUID and MSR
adjustments. We thus rename the pre-existing CpuProfileData struct
to CpuIdProfileData and adapt the CPU profile generation tool
accordingly.

We also make the CPU profile generation tool write directly to file
and automatically introduce the required license file as well. This
makes the profile generation process more convenient.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
Introduce convenience methods for looking up CPUID value definitions.

We will later use these methods to assert certain policies at compile
time in order to stay consistent with MSR policies we introduce.

Due to the current limitations of const generics we unfortunately
need to duplicate a little bit code.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
In order to generate CPU profiles we also need definitions and policies
for MSR-based features, as some CPU features are exposed through MSRs
rather than CPUID.

This commit introduces the MSR analogues of the data structures we
previously introduced for CPUID definitions.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
We introduce MSR-based feature definitions for Intel CPUs that will be
utilized by the upcoming CPU profile generation tool.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
We apply changes suggested in the PR review to the
IA32_ARCH_CAPABILITIES MSR policies.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
This is to be consistent with recent changes to CPUID policies

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
While KVM already has compatibility checks for most MSR-based features
which run when these MSRs are set by userspace, we do not get very
much useful information about exactly what the problem is when any
of these checks fail.

Hence to be on the safe side and also to ensure good UX for users
running into errors when trying to apply a CPU profile we introduce our
own compatibility checks for Intel CPUs that log at the error and debug
levels. The error logs aim to provide the minimal amount of information
required to investigate the problem further, while the debug logs
provide (much) more convenience when debugging.

We will incorporate these checks in the context of CPU profiles in
a later commit.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
Due to the changes to IA32_ARCH_CAPABILITIES applied after the last
code review we introduce stricter checks.

Signed-off-by: Oliver Anderson
On-behalf-of: SAP oliver.anderson@sap.com
This list will be used to help us detect unknown MSRs when generating
CPU profiles. It serves no other purpose beyond that.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
TODO: Squash into previous commit if this all works as expected

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
We include a list of non-architectural MSRS. This list will only be
used to help the CPU profile generation tool rule out MSRs that it
does not know how to handle.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
We include a list of MSRS defined by KVM that may be approved by
CPU profiles and another list of those that may not be approved by
CPU profiles. These lists will later be used by the CPU profile
generation tool.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
The list of HyperV MSRs introduced here will be utilized during CPU
profile generation and also at runtime to filter them out whenever
`kvm_hyperv` is set to `false`.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
We introduce functionality related to computing necessary MSR updates
in accordance with the given CPU profile.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
We introduce functionality to filter out MSRs which we want to deny
guests from using.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
We record the necessary MSR-based feature modifications that need to be
set in the `CpuManager` and make sure to set these MSR values upon
vCPU configuration. We also use the Vm to filter access to MSRs that
are incompatible with the chosen CPU profile.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
We adapt the CPU profile generation tool to also take the MSR-based
features into account.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
We regenerate the CPU profiles and include the MSR-related data.

Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants