Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Guide/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,10 @@
- [IGVM](./reference/architecture/openhcl/igvm.md)
- [Device Architecture](./reference/architecture/devices.md)
- [Storage Pipeline](./reference/architecture/devices/storage.md)
- [Core Concepts]()
- [Virtualized Processors](./reference/architecture/concepts/procs.md)
- [VMBus Architecture]()
- [Channels](./reference/architecture/vmbus/channels.md)

---

Expand Down
53 changes: 53 additions & 0 deletions Guide/src/reference/architecture/concepts/procs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Processors in the VMM
Comment thread
mattkur marked this conversation as resolved.
This page describes how virtual and physical processor identifiers are mapped.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like i have deja vu, didn't i see this somewhere else? or did it get yanked?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be inline in prior iterations, and is now it's own separate page.


## VP index, CPU number, and APIC ID

Much code in the OpenVMM repo relies on a numeric identifier for a virtual
processor (VP). This is a VMM-specific VP index, which is the hypervisor-level
identifier assigned to each virtual processor, starting at 0. Three identifiers
are often confused:

| Identifier | What it is | Numbering |
|-----------|-----------|-----------|
| **VP index** | Hypervisor-assigned processor number | 0, 1, 2, ... contiguous |
| **Linux CPU number** | The kernel's `cpu` in OpenHCL | Currently equals VP index (see below) |
| **APIC ID** (x86) | Hardware interrupt target | May differ — depends on topology |
| **MPIDR** (aarch64) | ARM processor affinity register | Not the VP index — topology-dependent |

Each platform has its own architectural way of describing CPUs, with x86 APIC
IDs and MPIDR on AArch64. Note that these values cannot be assumed to map
directly to VP index, as the physical or virtual topology of a system determines
the values for these architectural identifiers.

These can be different even than the **VTL0 guest's** perspective. The guest may
have its own CPU numbering (which may or may not match the VP index). Guests are
required to translate the guest VP number to a hypervisor VP number, which is
then passed to the VMM. For example, the VMBus protocol allows guest drivers to
specify a VP index for a channel.

```text
VTL0 guest sees: Host / VTL2 sees:
┌──────────────┐ ┌──────────────┐
│ CPU 0 ───────┼────────► │ VP index 0 │
│ CPU 1 ───────┼────────► │ VP index 1 │
│ CPU 2 ───────┼────────► │ VP index 2 │
│ ... │ │ ... │
└──────────────┘ └──────────────┘
Guest CPU N maps to VP index N = Linux
VP index N (typical) CPU N (OpenHCL today)
```

In OpenHCL today, the VMM assumes that its view of the VP index is the same as
the CPU number in the OpenHCL Linux Kernel. This is a simplifying assumption,
not an architectural guarantee. This works because OpenHCL's boot shim validates
that device-tree CPU ordering matches VP index ordering. This mapping is not
guaranteed in a general purpose guest. The boot shim also controls the CPU
online sequence to maintain the mapping.
Comment thread
mattkur marked this conversation as resolved.

The APIC ID is a separate concept. On x86, the APIC ID may not match the VP
index, especially with complex topologies (multiple sockets, SMT). The
hypervisor provides a [`GetVpIndexFromApicId`
hypercall](https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercalls/hvcallgetvpindexfromapicid)
for translation. On aarch64, the device tree `reg` property for each CPU is the
MPIDR, which is also not the VP index.
147 changes: 147 additions & 0 deletions Guide/src/reference/architecture/vmbus/channels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# VMBus Channels

VMBus is the synthetic bus that connects guest drivers to host-side device
backends. Every VMBus device communicates through one or more **channels** —
bidirectional ring-buffer pairs backed by guest memory.

## What is a channel?

A VMBus channel is:

- A **ring buffer pair** — one incoming (guest → host), one outgoing (host →
guest) — backed by a single guest-allocated GPADL (Guest Physical Address
Descriptor List) — a guest-provided description of guest-physical pages
shared with the host. "Incoming" and "outgoing" are always relative to the
local endpoint: each side's incoming ring is the other side's outgoing ring.
In OpenVMM (the host), the incoming ring carries data from the guest and
the outgoing ring carries data to the guest.
- An **interrupt/event signal** for each direction.
- A **target VP** — the guest vCPU targeted for channel notifications. In
OpenVMM's current implementation, this value also selects the host-side
executor used for processing that channel.

Each channel is identified by a unique `channel_id` assigned by the VMBus server
at offer time. The channel's lifecycle is: **offered → opened → closed** (or
**rescinded** by the host). If the host rescinds an offer, the channel is torn
down regardless of guest state.

Comment thread
mattkur marked this conversation as resolved.
```text
Comment thread
mattkur marked this conversation as resolved.
┌──────────────────────────────────────────────────┐
│ VMBus Channel │
│ │
│ ┌───────────────────┐ ┌───────────────────┐ │
│ │ Incoming Ring │ │ Outgoing Ring │ │
│ │ (guest → host) │ │ (host → guest) │ │
│ └─────────┬─────────┘ └─────────┬─────────┘ │
│ │ │ │
│ ┌─────────┴──────────────────────┴─────────┐ │
│ │ GPADL-backed memory (guest-allocated) │ │
│ └──────────────────────────────────────────┘ │
│ │
│ Signal: guest → host Signal: host → guest │
│ Target VP: set at open time │
└──────────────────────────────────────────────────┘
```

## Subchannels

A **subchannel** is a full additional VMBus channel offer for the same device
instance. It is not a side-queue or a sub-object of the primary channel — it has
its own ring buffer GPADL, its own open/close lifecycle, its own channel ID, and
its own target VP.
Comment thread
mattkur marked this conversation as resolved.

The identity of a channel within a device is the tuple `(interface_id,
instance_id, subchannel_index)`:

| Field | Meaning |
|-------|---------|
| `interface_id` | Device type GUID (e.g., SCSI controller) |
| `instance_id` | Specific device instance |
| `subchannel_index` | `0` for the primary channel, `1..n` for subchannels |

### Primary and subchannel relationship

- The **primary channel** (`subchannel_index == 0`) is always offered first and
handles protocol negotiation.
- **Subchannels** are offered only after the primary is open, when the device
explicitly enables them.
- A subchannel **cannot exist without its primary channel**. If the primary
channel closes, all subchannels are automatically revoked and closed.
- Subchannels are opened and closed independently; closing one subchannel does
not inherently require closing the primary or other subchannels.

```mermaid
stateDiagram-v2
[*] --> PrimaryOffered: VMBus server offers device
PrimaryOffered --> PrimaryOpen: Guest opens primary (subchannel_index=0)
PrimaryOpen --> SubchannelsOffered: Device backend requests N subchannels
SubchannelsOffered --> AllOpen: Guest opens subchannels 1..n
AllOpen --> PrimaryOpen: Guest closes subchannels
PrimaryOpen --> [*]: Guest closes primary → all subchannels revoked
```

### Why subchannels exist

Subchannels enable **I/O parallelism with CPU locality**. Each channel has its
own ring buffer and target VP, so:

- Multiple VPs can issue I/O concurrently without contending on a single ring
buffer.
- Each channel's host-side worker runs on the target VP's thread, keeping cache
lines warm and avoiding cross-VP interrupts.

Without subchannels, all I/O for a device funnels through one ring and one
worker — a bottleneck on multi-VP VMs.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
worker — a bottleneck on multi-VP VMs.
worker, which can bottleneck on multi-VP VMs.


## Target VP

When a guest opens a channel, it specifies a `target_vp` — the guest vCPU that
will receive channel interrupts and events. In OpenVMM's current implementation,
the VMBus server also uses this value to select the executor that runs the device
worker for that channel.

The guest can change the target VP at runtime via the `ModifyChannel` VMBus
message. This is used when VPs come online/offline (e.g., CPU hot-remove) and
the guest needs to rebalance channel assignments.

If you're curious to learn more about how the VMM and guest decide on the notion
of a `target_vp`, see the [Processors](../concepts/procs.md) page.

## Ring buffer model

Each ring is a fixed-size circular buffer. The size is determined at channel
open time and cannot change while the channel is open. Key properties:

- **No overflow** — if the ring is full, the sender must wait. The full ring
itself is the only backpressure mechanism; there is no explicit flow-control
protocol.
- **Batched reads** — the host reads packets in batches via
[`poll_read_batch()`](https://openvmm.dev/rustdoc/linux/vmbus_async/queue/struct.ReadHalf.html#method.poll_read_batch)
(interrupt-driven) or
[`try_read_batch()`](https://openvmm.dev/rustdoc/linux/vmbus_async/queue/struct.ReadHalf.html#method.try_read_batch)
(poll mode, no interrupt).
- **Paired** — rings always come in pairs (incoming + outgoing). A channel
without both rings is not usable.

Since ring buffers reside in guest-allocated memory, the host must treat all ring
contents as untrusted input.

For the ring buffer implementation, see the [`vmbus_ring`
rustdoc](https://openvmm.dev/rustdoc/linux/vmbus_ring/index.html).

## Key types

The following Rust types are the primary building blocks in OpenVMM's VMBus
implementation; device backends typically interact with `VmbusDevice`,
`ChannelControl`, and `Queue`.

| Type | Crate | Role |
|------|-------|------|
| `OfferKey` | `vmbus_channel` | Channel identity tuple |
| `OfferParams` | `vmbus_channel` | Full offer metadata |
| `OpenData` | `vmbus_channel` | Guest-provided open parameters (target VP, ring GPADL) |
| `ChannelControl` | `vmbus_channel` | Device-side handle to enable subchannels |
| `VmbusDevice` | `vmbus_channel` | Trait for VMBus device implementations |
| `RawAsyncChannel` | `vmbus_channel` | Async wrapper around a ring buffer pair |
| `IncomingRing` / `OutgoingRing` | `vmbus_ring` | Low-level ring buffer types |
| `Queue` | `vmbus_async` | High-level async packet read/write over a channel |
32 changes: 27 additions & 5 deletions vm/devices/vmbus/vmbus_channel/src/channel.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,14 @@ pub trait VmbusDevice: Send + Any + InspectMut {
/// The offer parameters.
fn offer(&self) -> OfferParams;

/// The maximum number of subchannels supported by this device.
/// The maximum number of subchannels this device will accept.
///
/// This is the device's upper bound — the guest may request fewer (or
/// none). The VMBus framework uses this to allocate resources and to
/// reject [`ChannelControl::enable_subchannels`] calls that exceed
/// this limit.
///
/// Returns 0 by default (no subchannels — primary channel only).
fn max_subchannels(&self) -> u16 {
0
}
Expand All @@ -70,7 +77,14 @@ pub trait VmbusDevice: Send + Any + InspectMut {
/// Closes the channel number `channel_idx`.
async fn close(&mut self, channel_idx: u16);

/// Notifies the device that interrupts for channel will now target `target_vp`.
/// Notifies the device that the guest has retargeted interrupts for
/// `channel_idx` to `target_vp`.
///
/// This is called when the guest sends a `ModifyChannel` message to
/// change the VP that handles interrupts and ring processing for a
/// channel. Devices that create VP-targeted workers (e.g., StorVSP)
/// should forward this to their task driver via
/// [`VmTaskDriver::retarget_vp`](vmcore::vm_task::VmTaskDriver::retarget_vp).
async fn retarget_vp(&mut self, channel_idx: u16, target_vp: u32);

/// Start processing of all channels.
Expand Down Expand Up @@ -124,6 +138,11 @@ pub struct ChannelResources {
}

/// Control object for enabling subchannels.
///
/// Obtained from [`DeviceResources`] after the device is installed. The
/// device calls [`enable_subchannels`](Self::enable_subchannels) from its
/// protocol handler when the guest requests subchannels — for example,
/// StorVSP calls this when the guest sends `CREATE_SUB_CHANNELS`.
#[derive(Debug, Default, Clone)]
pub struct ChannelControl {
send: Option<mesh::Sender<u16>>,
Expand All @@ -138,10 +157,13 @@ pub struct TooManySubchannels;
impl ChannelControl {
/// Enables the first `count` subchannels.
///
/// If more than `count` subchannels are already enabled, this does nothing.
/// If `count` or more subchannels are already enabled, this does
/// nothing (the count only grows, never shrinks).
///
/// Fails if `count` is bigger than the requested maximum returned by
/// [`VmbusDevice::max_subchannels`].
/// Fails with [`TooManySubchannels`] if `count` exceeds the maximum
/// returned by [`VmbusDevice::max_subchannels`]. Callers should map
/// this error to an appropriate protocol response — for example,
/// StorVSP returns `INVALID_PARAMETER` to the guest.
pub fn enable_subchannels(&self, count: u16) -> Result<(), TooManySubchannels> {
if count > self.max {
return Err(TooManySubchannels);
Expand Down
Loading