Skip to content

Migration: Aggregate Dirty Log in Thread, Reduce Migration Downtime#82

Draft
phip1611 wants to merge 4 commits intocyberus-technology:gardenlinuxfrom
phip1611:poc-dirty-rate-thread
Draft

Migration: Aggregate Dirty Log in Thread, Reduce Migration Downtime#82
phip1611 wants to merge 4 commits intocyberus-technology:gardenlinuxfrom
phip1611:poc-dirty-rate-thread

Conversation

@phip1611
Copy link
Member

@phip1611 phip1611 commented Feb 12, 2026

This adds the basic plumbing for the DirtyLogWorker which will fetch the
dirty log asynchronously in background and aggregate the effective
MemoryRangeTable with dirties memory.

Context

This wasn't planned. I did it as a side-project in the past weeks and finalized my work now! I see this as crucial for production-grade live-migration.

Motivation

  • Performance: Fetching the dirty log, parsing the dirty bitmap, and
    aggregating the corresponding data structures is fairly costly. I just
    ran a vM with a active working set of 5 GiB (with 4 workers) and the
    measured overhead per iteration was 10-20ms. Given that we want to
    have as small downtimes as possible, we want that overhead to be
    close to zero for the final iteration.
  • Accurate dirty rate: This way, we have a more fine-grained sampling of
    the dirty rate (dirties 4k pages per second) which is an interesting
    metric to observe the current workload (regarding memory writes).

Design

I actively decided against Arc<Mutex> in the DirtyLogWorker as
this would be very invasive, make the migration code overly complicated
(many locks and unlocks at the right times) and lastly, be a very big
change to only call vm.dirty_log() in the thread. Note that the latter
is just a thin wrapper around calling cpu_manager.dirty_log() and
memory_manager.dirty_log().

Bigger Picture / Outlook to KVMs Dirty Ring Interface

The most robust and performant version which Cloud Hypervisor should use
to get dirtied pages in the future is KVM's dirty ring interface [0].
This requires [1] to be merged first in rust-vmm/kvm. Experience showed
that bumping any of the rust-vmm crates is a major challenge as all of
them are highly interdependent and developed in individual repositories.
So it will take some time before we can even consider starting the work
of that feature in CHV.

Steps to Undraft

  • Investigate and Measure Overhead Costs
  • For example, we allocate a vector for the whole VM memory on each invocation! For 12tb (in 4k pages), the allocation is huge every time!
  • Perhaps, a mechanism to not allocate over and over again would be the better approach?!

Hints for Reviewers / Testing

Closes https://github.com/cobaltcore-dev/cobaltcore/issues/280

@phip1611 phip1611 force-pushed the poc-dirty-rate-thread branch from f82765c to 4e9ab64 Compare February 12, 2026 21:04
@phip1611 phip1611 force-pushed the poc-dirty-rate-thread branch from 4e9ab64 to f403533 Compare March 11, 2026 11:50
@phip1611 phip1611 changed the base branch from gardenlinux-v50 to gardenlinux March 11, 2026 11:51
@phip1611 phip1611 changed the title XXX migration: fetch dirty_log in thread Migration: Aggregate Dirty Log in Thread, Reduce Migration Downtime Mar 11, 2026
@phip1611 phip1611 marked this pull request as ready for review March 11, 2026 11:52
@phip1611 phip1611 self-assigned this Mar 11, 2026
@phip1611 phip1611 force-pushed the poc-dirty-rate-thread branch from f403533 to 47a44c8 Compare March 11, 2026 13:06
@phip1611 phip1611 requested a review from blitz March 11, 2026 13:07
@phip1611 phip1611 force-pushed the poc-dirty-rate-thread branch from 47a44c8 to 903e297 Compare March 11, 2026 13:13
@phip1611 phip1611 requested review from Coffeeri and scholzp March 11, 2026 13:13
To aggregate the dirty log in a thread asynchronously, we need to be
able to properly merge MemoryRangeTables into each other to prevent
transmitting the same memory multiple times.

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
@phip1611 phip1611 force-pushed the poc-dirty-rate-thread branch 2 times, most recently from 0df55d4 to 4c033b4 Compare March 11, 2026 13:25
Comment on lines +43 to +56
/// All shared state of [`DirtyLogWorker`] that is behind the same lock.
struct DirtyLogWorkerProtectedState {
/// The dirty rates measured in the past [`DIRTY_RATE_CALC_TIMESLICE`].
///
/// Used to calculate the dirty rate.
dirty_rates_pps: VecDeque<u64>,
/// The constantly updated (and merged) memory range table since the data
/// was moved out of the struct the last time.
table: MemoryRangeTable,
/// The timestamp of the last processing, used to calculate the dirty rate.
last_timestamp: Instant,
/// Set to true to signal the worker thread to stop and exit.
stop: bool,
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "Protected" mean here?

Does being behind the same lock imply that there are no locks within the struct? If yes, I'd prefer to document that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair question! I'm looking for a new name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thanks!

I'm still unsure about the comment regarding being behind the same lock.

Comment on lines +183 to +186
/// Starts the thread and let it run until [`DirtyLogWorkerHandle::stop`] is called.
pub fn run(self) -> Result<(), MigratableError /* dirty log error */> {
info!("starting thread");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this doesn't start the thread, it's the method that's called by the thread.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment still refers to starting the thread.

Suggested change
/// Starts the thread and let it run until [`DirtyLogWorkerHandle::stop`] is called.
pub fn run(self) -> Result<(), MigratableError /* dirty log error */> {
info!("starting thread");
/// Fetches the dirty log and updates the internal metrics.
///
/// Thread entry function, executed until [`DirtyLogWorkerHandle::stop`] is called.
pub fn run(self) -> Result<(), MigratableError /* dirty log error */> {
info!("thread started");

@phip1611 phip1611 force-pushed the poc-dirty-rate-thread branch from 4c033b4 to 7b7bcda Compare March 11, 2026 15:12
This adds the basic plumbing for the DirtyLogWorker which will fetch the
dirty log asynchronously in background and aggregate the effective
MemoryRangeTable with dirtied memory.

# Motivation

- Performance: Fetching the dirty log, parsing the dirty bitmap, and
  aggregating the corresponding data structures is fairly costly. I just
  ran a vM with a active working set of 5 GiB (with 4 workers) and the
  measured overhead per iteration was 10-20ms. Given that we want to
  have as small downtimes as possible, we want that overhead to be
  close to zero for the final iteration.
- Accurate dirty rate: This way, we have a more fine-grained sampling of
  the dirty rate (dirties 4k pages per second) which is an interesting
  metric to observe the current workload (regarding memory writes).

# Bigger Picture / Outlook to KVMs Dirty Ring Interface

The most robust and performant version which Cloud Hypervisor should use
to get dirtied pages in the future is KVM's dirty ring interface [0].
This requires [1] to be merged first in rust-vmm/kvm. Experience showed
that bumping any of the rust-vmm crates is a major challenge as all of
them are highly interdependent and developed in individual repositories.
So it will take some time before we can even consider starting the work
of that feature in CHV.

That being said: This design improves the current situation
significantly without blocking any future refactorings or replacements
with KVM's dirty ring interface.

# Design

I actively decided against Arc<Mutex<Vm>> in the DirtyLogWorker as
this would be very invasive, make the migration code overly complicated
(many locks and unlocks at the right times) and lastly, be a very big
change to only call `vm.dirty_log()` in the thread. Note that the latter
is just a thin wrapper around calling `cpu_manager.dirty_log()` and
`memory_manager.dirty_log()`.

[0] https://lwn.net/Articles/833206/
[1] rust-vmm/kvm#344

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
Now, the overhead per precopy iteration drops to 0ms. We, however, have
a small overhead of joining the thread, which takes <=1ms in my setup.

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
}

/// Starts the thread and let it run until [`DirtyLogWorkerHandle::stop`] is called.
pub fn run(self) -> Result<(), MigratableError /* dirty log error */> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this is (and should be) only called by the DirtyLogWorker::spawn method.

Suggested change
pub fn run(self) -> Result<(), MigratableError /* dirty log error */> {
fn run(self) -> Result<(), MigratableError /* dirty log error */> {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants