Metrics for crucible disks by leftwo · Pull Request #1073 · oxidecomputer/propolis

leftwo · 2026-03-09T18:33:25Z

Crucible disk metrics were missing because set_metric_consumer was called on a DeviceAttachment before Crucible's queues were associated with it. The original implementation only walked the already-associated queues and handed the consumer to each one's QueueMinder. Any queue that arrived late simply never received it.

The fix stores the MetricConsumer in QueueColState (the collection-level state guarded by the collection's mutex), then in queue_associate propagates it to any newly-arriving queue's minder before the minder is installed into the slot. This mirrors exactly how the paused flag is already propagated to late-associating queues.

A minor correctness fix: as_mut() -> as_ref() on the minder Option
was also included, since set_metric_consumer takes &self.

jmcarp · 2026-03-11T16:31:42Z

I just ran into this myself. Looking forward to this landing so my dashboards will look nicer.

leftwo · 2026-03-11T20:43:00Z

Fix for: #1077

leftwo · 2026-03-11T21:06:09Z

Crucible disk metrics from Dublin running this repo:

leftwo · 2026-03-11T23:40:50Z

To answer a question @iximeow had, does the scrub count for these?

To answer, here we have a propolis server running a scrub:

23:19:20.856Z INFO propolis-server (vm_state_driver): Scrub check for f04ccd2c-bbc9-4db0-9f3a-9c8bc25ec122               
23:19:20.856Z INFO propolis-server (vm_state_driver): Scrub pause 120 seconds before starting                            
23:21:20.857Z INFO propolis-server (vm_state_driver): Scrub for f04ccd2c-bbc9-4db0-9f3a-9c8bc25ec122 begins              
23:21:20.857Z INFO propolis-server (vm_state_driver): Scrub with total_size:85899345920 block_size:512                   
23:21:20.857Z INFO propolis-server (vm_state_driver): Scrubs from block 0 to 167772160 in (256) 131072 size IOs pm:25

So, we have a read only parent copying IO on a new blank disk.
We can see this in dtrace output, where we read from one volume and write to the other:

  PID     UUID  SESSION DS0 DS1 DS2   NEXT_JOB  DELTA CONN   ELR   ELC   ERR   ERN                                       
16419 1122ef37 de0b7fed ACT ACT ACT      18359     37    3     0     0     0     0                                       
16419 f04ccd2c f6dbfa5c ACT ACT ACT      20817     37    3     0     0     0     0                                       
16419 1122ef37 de0b7fed ACT ACT ACT      18397     38    3     0     0     0     0                                       
16419 f04ccd2c f6dbfa5c ACT ACT ACT      20854     37    3     0     0     0     0

Looking in the console, we can see this:

We see the traffic from the initial boot, then IOs go to zero.
Given we are doing 1M IOs, we should see something in the metrics if the scrub was counted.

A few minutes later, we see a little traffic on the disk, but I suspect this is from whatever background activities the boot disk is logging. I don't see traffic here to indicate the scrub traffic is being counted.

Now I have question for me, and that is why are these not counted?

leftwo and others added 2 commits March 9, 2026 11:30

Metrics for disks

07faaf8

fix cargo fmt

8a00267

Move clone after loop

e42f6e5

leftwo marked this pull request as ready for review March 11, 2026 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics for crucible disks#1073

Metrics for crucible disks#1073
leftwo wants to merge 3 commits intomasterfrom
alan/cru-wants-oximeter

leftwo commented Mar 9, 2026 •

edited

Loading

Uh oh!

jmcarp commented Mar 11, 2026

Uh oh!

leftwo commented Mar 11, 2026

Uh oh!

leftwo commented Mar 11, 2026

Uh oh!

leftwo commented Mar 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leftwo commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmcarp commented Mar 11, 2026

Uh oh!

leftwo commented Mar 11, 2026

Uh oh!

leftwo commented Mar 11, 2026

Uh oh!

leftwo commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leftwo commented Mar 9, 2026 •

edited

Loading

leftwo commented Mar 11, 2026 •

edited

Loading