Skip to content

Instance boot disk corrupted after sled expungement #1765

@wfchandler

Description

@wfchandler

A preview-silo user on colo reported that instance 5d62dadc-09a6-427b-8a4b-c7f1d4585a2a became corrupted at some point between 2025-08-12 and 2025-08-20 while the instance was idle. This appears to have been caused by a region replace that occurred on 2025-08-18 when two sleds were expunged from the rack.

The instance spontaneously rebooted and reported a failed fsck:

Begin: Running /scripts/init-premount ... done.
...
Begin: Will now check root file system ... fsck from util-linux 2.40.2
[/usr/sbin/fsck.ext4 (1) -- /dev/nvme0n1p1] fsck.ext4 -a -C0 /dev/nvme0n1p1
cloudimg-rootfs: recovering journal
cloudimg-rootfs: Clearing orphaned inode 86387 (uid=0, gid=0, mode=0100644, size=113)
cloudimg-rootfs: clean, 119390/878080 files, 801009/1806587 blocks
done.
[    2.746978] EXT4-fs (nvme0n1p1): orphan cleanup on readonly fs
[    2.749154] EXT4-fs error (device nvme0n1p1): ext4_orphan_get:1422: comm mount: bad orphan inode 86387
[    2.752630] ext4_test_bit(bit=7986, block=278) = 0
[    2.754377] EXT4-fs (nvme0n1p1): recovery complete
[    2.756155] EXT4-fs error (device nvme0n1p1): ext4_mark_recovery_complete:6236: comm mount: Orphan file not empty on read-only fs.
[    2.759987] EXT4-fs (nvme0n1p1): mount failed
mount: mounting /dev/nvme0n1p1 on /root failed: Structure needs cleaning
done.
Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ... mount: mounting /dev on /root/dev failed: No such file or directory
mount: mounting /dev on /root/dev failed: No such file or directory
done.

Instance details:

# omdb db instance info 5d62dadc-09a6-427b-8a4b-c7f1d4585a2a
== INSTANCE ====================================================================
                        ID: 5d62dadc-09a6-427b-8a4b-c7f1d4585a2a
                project ID: d645d5a8-874c-40cd-9a83-052b7dbff39f
                      name: <SNIP>
               description: <SNIP>
                created at: 2025-02-11 15:33:06.826909 UTC
          last modified at: 2025-07-03 12:37:18.954700 UTC

== CONFIGURATION ===============================================================
                     vCPUs: 2
                    memory: 8 GiB
                  hostname: <SNIP>
                 boot disk: Some(0ec602dd-32d3-442d-bc91-f7e6df9e97c4)
              auto-restart:
                  InstanceAutoRestart {
                      policy: None,
                      cooldown: None,
                  }

== RUNTIME STATE ===============================================================
               nexus state: Vmm
(i)     external API state: Running
            intended state: running
           last updated at: 2025-08-13T16:46:21.736487Z (generation 17)
       needs reincarnation: false
             karmic status: saṃsāra (reincarnation enabled)
      last reincarnated at: Some(2025-08-13T16:47:29.620470Z)
             active VMM ID: Some(c79f5363-3a25-48af-87ec-25a6d50066fa)
             target VMM ID: None
              migration ID: None
              updater lock: UNLOCKED at generation: 24

== ACTIVE VMM ==================================================================
                        ID: c79f5363-3a25-48af-87ec-25a6d50066fa
               instance ID: 5d62dadc-09a6-427b-8a4b-c7f1d4585a2a
                created at: 2025-08-13 16:47:29.499499 UTC
                     state: running
                updated at: 2025-08-21T11:42:29.020819Z (generation 7)
          propolis address: fd00:1122:3344:116::1:3e8:12400
                   sled ID: 5d8a45b6-8d43-4d7b-a7fb-853017bdab0a

== ATTACHED DISKS ==============================================================
# ID                                   SIZE  STATE    NAME                                    
0 0ec602dd-32d3-442d-bc91-f7e6df9e97c4 8 GiB attached <SNIP>

Disk details:

# omdb db disks info 0ec602dd-32d3-442d-bc91-f7e6df9e97c4
HOST_SERIAL DISK_NAME                               INSTANCE_NAME PROPOLIS_ZONE                                            VOLUME_ID                            DISK_STATE 
BRM42220077 <SNIP> <SNIP> oxz_propolis-server_c79f5363-3a25-48af-87ec-25a6d50066fa b5e92f40-390e-4d00-a9a4-b25709152250 attached   
HOST_SERIAL REGION                               DATASET                              PHYSICAL_DISK                        
BRM42220028 ca23d966-4846-4192-ae60-c6d5f1b8cc7c 7af9f38b-0c7a-402e-8db3-7c7fb50b4665 cbeb6276-0a5a-4d6d-bbfa-fa607a25abc2 
BRM42220015 4ed7e67a-7fa5-46eb-b312-67c647489808 b990911b-805a-4f9d-bd83-e977f5b19a35 dfcbd177-2ec1-471e-80d5-c5ac14c27486 
BRM42220019 86d27bde-4465-4208-8442-5f752e918fae c723c4b8-3031-4b25-8c16-fe08bc0b5f00 67cf7884-6d3c-48d6-baec-e21f5407a038 
VCR from volume ID b5e92f40-390e-4d00-a9a4-b25709152250
ID                                   BS  SUB_VOLUMES READ_ONLY_PARENT 
0ec602dd-32d3-442d-bc91-f7e6df9e97c4 512 1           false       

Searching for the disk regions on their corresponding sleds, we see a repair job on BRM42220015 on Aug 18 18:50. There were no obvious errors with this job.

From /pool/ext/7e67cb32-0c00-4090-9647-eb7bae75deeb/crypt/debug/oxz_crucible_b990911b-805a-4f9d-bd83-e977f5b19a35/oxide-crucible-downstairs:downstairs-4ed7e67a-7fa5-46eb-b312-67c647489808.log.1755543004

18:50:02.080Z INFO crucible: Created copy dir "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07C.copy"
18:50:02.080Z INFO crucible: eid:124 Found repair files: ["07C"]
18:50:02.489Z INFO crucible: Verify extent 124 still ready for copy
18:50:02.489Z INFO crucible: 1 repair files downloaded, move directory "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07C.copy" to "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07C.replace"
18:50:02.490Z INFO crucible: Copy files from "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07C.replace" in "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000"
18:50:02.592Z INFO crucible: Move directory  "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07C.replace" to "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07C.completed"
18:50:02.595Z WARN crucible: 5200 job ELiveReopen waiting on 1 deps
    role = work
    upstairs_id = 0ec602dd-32d3-442d-bc91-f7e6df9e97c4
18:50:02.596Z INFO crucible: Created copy dir "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07D.copy"
18:50:02.596Z INFO crucible: eid:125 Found repair files: ["07D"]
18:50:02.993Z INFO crucible: Verify extent 125 still ready for copy
18:50:02.993Z INFO crucible: 1 repair files downloaded, move directory "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07D.copy" to "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07D.replace"
18:50:02.994Z INFO crucible: Copy files from "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07D.replace" in "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000"
18:50:03.095Z INFO crucible: Move directory  "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07D.replace" to "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07D.completed"
18:50:03.098Z WARN crucible: 5204 job ELiveReopen waiting on 1 deps
    role = work
    upstairs_id = 0ec602dd-32d3-442d-bc91-f7e6df9e97c4
18:50:03.099Z INFO crucible: Created copy dir "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07E.copy"
18:50:03.099Z INFO crucible: eid:126 Found repair files: ["07E"]
18:50:03.504Z INFO crucible: Verify extent 126 still ready for copy
18:50:03.504Z INFO crucible: 1 repair files downloaded, move directory "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07E.copy" to "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07E.replace"
18:50:03.504Z INFO crucible: Copy files from "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07E.replace" in "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000"
18:50:03.606Z INFO crucible: Move directory  "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07E.replace" to "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07E.completed"
18:50:03.609Z WARN crucible: 5208 job ELiveReopen waiting on 1 deps
    role = work
    upstairs_id = 0ec602dd-32d3-442d-bc91-f7e6df9e97c4
18:50:03.610Z INFO crucible: Created copy dir "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07F.copy"
18:50:03.611Z INFO crucible: eid:127 Found repair files: ["07F"]
18:50:04.000Z INFO crucible: Verify extent 127 still ready for copy
18:50:04.000Z INFO crucible: 1 repair files downloaded, move directory "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07F.copy" to "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07F.replace"
18:50:04.000Z INFO crucible: Copy files from "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07F.replace" in "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000"
18:50:04.462Z INFO crucible: Move directory  "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07F.replace" to "/data/regions/4ed7e67a-7fa5-46eb-b312-67c647489808/00/000/07F.completed"

One of the downstairs for this disk appears to have been hosted on BRM42220001 or BRM44220002, which were expunged from the colo rack on 2025-08-18.

We see a corresponding saga in the Nexus logs:

# pilot host exec -c 'grep -h 4ed7e67a-7fa5-46eb-b312-67c647489808 $(/opt/oxide/oxlog/oxlog logs oxz_nexus* nexus --current --archived -A 2025-08-18 -B 2025-08-19) | looker' 8 14 20
INFO 1cfdb5b6-e568-436a-a85f-7fecf1b8eef2 (ServerContext): saga create
    dag = {"end_node":9,"graph":{"edge_property":"directed","edges":[[0,1,null],[1,2,null],[2,3,null],[3,4,null],[4,5,null],[5,6,null],[6,7,null],[8,0,null],[7,9,null]],"node_holes":[],"nodes":[{"Action":{"action_name":"common.uuid_generate","label":"GenerateSagaId","name":"saga_id"}},{"Action":{"action_name":"common.uuid_generate","label":"GenerateJobId","name":"job_id"}},{"Action":{"action_name":"region_replacement_drive.set_saga_id","label":"SetSagaId","name":"unused_1"}},{"Action":{"action_name":"region_replacement_drive.drive_region_replacement_check","label":"DriveRegionReplacementCheck","name":"check"}},{"Action":{"action_name":"region_replacement_drive.drive_region_replacement_prepare","label":"DriveRegionReplacementPrepare","name":"prepare"}},{"Action":{"action_name":"region_replacement_drive.drive_region_replacement_execute","label":"DriveRegionReplacementExecute","name":"execute"}},{"Action":{"action_name":"region_replacement_drive.drive_region_replacement_commit","label":"DriveRegionReplacementCommit","name":"commit"}},{"Action":{"action_name":"region_replacement_drive.finish_saga","label":"FinishSaga","name":"unused_2"}},{"Start":{"params":{"request":{"id":"4342f5e1-0690-4cac-bb30-f7b98a857856","new_region_id":"4ed7e67a-7fa5-46eb-b312-67c647489808","old_region_id":"54462a44-99ea-44e2-8743-ee4a369405e6","old_region_volume_id":"be11ed6d-b780-492f-afda-2bd00a1e219b","operating_saga_id":null,"replacement_state":"Running","request_time":"2025-08-18T18:46:01.776498Z","volume_id":"b5e92f40-390e-4d00-a9a4-b25709152250"},"serialized_authn":{"kind":{"Authenticated":[{"actor":{"UserBuiltin":{"user_builtin_id":"001de000-05e4-4000-8000-000000000002"}}},null]}}}}},"End"]},"saga_name":"region-replacement-drive","start_node":8}
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/steno-0.4.1/src/sec.rs:1146
    saga_id = 00bd8bcd-fcfc-4054-9acf-2e3e68799c73
    saga_name = region-replacement-drive
    sec_id = 1cfdb5b6-e568-436a-a85f-7fecf1b8eef2

18:50:04.994Z INFO 1cfdb5b6-e568-436a-a85f-7fecf1b8eef2 (ServerContext): saga create
    dag = {"end_node":15,"graph":{"edge_property":"directed","edges":[[0,1,null],[1,2,null],[2,3,null],[4,5,null],[4,6,null],[6,7,null],[5,7,null],[7,8,null],[8,9,null],[9,10,null],[10,11,null],[3,4,null],[11,12,null],[12,13,null],[14,0,null],[13,15,null]],"node_holes":[],"nodes":[{"Action":{"action_name":"common.uuid_generate","label":"GenerateSagaId","name":"saga_id"}},{"Action":{"action_name":"region_replacement_finish.set_saga_id","label":"SetSagaId","name":"unused_1"}},{"Constant":{"name":"params_for_volume_delete_subsaga","value":{"serialized_authn":{"kind":{"Authenticated":[{"actor":{"UserBuiltin":{"user_builtin_id":"001de000-05e4-4000-8000-000000000002"}}},null]}},"volume_id":"be11ed6d-b780-492f-afda-2bd00a1e219b"}}},{"SubsagaStart":{"params_node_name":"params_for_volume_delete_subsaga","saga_name":"volume-delete"}},{"Action":{"action_name":"volume_delete.decrease_crucible_resource_count","label":"DecreaseCrucibleResourceCount","name":"crucible_resources_to_delete"}},{"Action":{"action_name":"volume_delete.delete_crucible_regions","label":"DeleteCrucibleRegions","name":"no_result_1"}},{"Action":{"action_name":"volume_delete.delete_crucible_running_snapshots","label":"DeleteCrucibleRunningSnapshots","name":"no_result_2"}},{"Action":{"action_name":"volume_delete.delete_crucible_snapshots","label":"DeleteCrucibleSnapshots","name":"no_result_3"}},{"Action":{"action_name":"volume_delete.delete_crucible_snapshot_records","label":"DeleteCrucibleSnapshotRecords","name":"no_result_4"}},{"Action":{"action_name":"volume_delete.find_freed_crucible_regions","label":"FindFreedCrucibleRegions","name":"freed_crucible_regions"}},{"Action":{"action_name":"volume_delete.delete_freed_crucible_regions","label":"DeleteFreedCrucibleRegions","name":"no_result_5"}},{"Action":{"action_name":"volume_delete.hard_delete_volume_record","label":"HardDeleteVolumeRecord","name":"volume_hard_deleted"}},{"SubsagaEnd":{"name":"volume_delete_subsaga_no_result"}},{"Action":{"action_name":"region_replacement_finish.update_request_record","label":"UpdateRequestRecord","name":"unused_2"}},{"Start":{"params":{"region_volume_id":"be11ed6d-b780-492f-afda-2bd00a1e219b","request":{"id":"4342f5e1-0690-4cac-bb30-f7b98a857856","new_region_id":"4ed7e67a-7fa5-46eb-b312-67c647489808","old_region_id":"54462a44-99ea-44e2-8743-ee4a369405e6","old_region_volume_id":"be11ed6d-b780-492f-afda-2bd00a1e219b","operating_saga_id":null,"replacement_state":"ReplacementDone","request_time":"2025-08-18T18:46:01.776498Z","volume_id":"b5e92f40-390e-4d00-a9a4-b25709152250"},"serialized_authn":{"kind":{"Authenticated":[{"actor":{"UserBuiltin":{"user_builtin_id":"001de000-05e4-4000-8000-000000000002"}}},null]}}}}},"End"]},"saga_name":"region-replacement-finish","start_node":14}
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/steno-0.4.1/src/sec.rs:1146
    saga_id = dc873936-3647-4d72-a780-bd1677c3008b
    saga_name = region-replacement-finish
    sec_id = 1cfdb5b6-e568-436a-a85f-7fecf1b8eef2

Both sagas succeeded:

root@oxz_switch1:~# /tmp/omdb-saga db sagas show dc873936-3647-4d72-a780-bd1677c3008b
 id                                   | time_created                   | name                      | state                 
--------------------------------------+--------------------------------+---------------------------+-----------------------
 dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:04.994311 UTC | region-replacement-finish | SagaCachedState(Done) 

                             saga id | event time                     | node id                                              | event type | data
------------------------------------ | ------------------------------ | ---------------------------------------------------- | ---------- | ---
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.038102 UTC |  14: start                                           | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.044057 UTC |  14: start                                           | succeeded  | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.047157 UTC |   0: common.uuid_generate                            | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.050566 UTC |   0: common.uuid_generate                            | succeeded  | "942394bf-0568-41ee-99d9-d11309df61a6"
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.054615 UTC |   1: region_replacement_finish.set_saga_id           | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.139374 UTC |   1: region_replacement_finish.set_saga_id           | succeeded  | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.142133 UTC |   2: params_for_volume_delete_subsaga                | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.145311 UTC |   2: params_for_volume_delete_subsaga                | succeeded  | {"serialized_authn":{"kind":{"Authenticated":[{"actor":{"UserBuiltin":{"user_builtin_id":"001de000-05e4-4000-8000-000000000002"}}},null]}},"volume_id":"be11ed6d-b780-492f-afda-2bd00a1e219b"}
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.147745 UTC |   3: subsaga start volume-delete                     | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.150137 UTC |   3: subsaga start volume-delete                     | succeeded  | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.152391 UTC |   4: volume_delete.decrease_crucible_resource_count  | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.170272 UTC |   4: volume_delete.decrease_crucible_resource_count  | succeeded  | {"V3":{"region_snapshots":[],"regions":["54462a44-99ea-44e2-8743-ee4a369405e6"]}}
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.173694 UTC |   5: volume_delete.delete_crucible_regions           | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.181375 UTC |   6: volume_delete.delete_crucible_running_snapshots | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.186542 UTC |   6: volume_delete.delete_crucible_running_snapshots | succeeded  | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.224149 UTC |   5: volume_delete.delete_crucible_regions           | succeeded  | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.227188 UTC |   7: volume_delete.delete_crucible_snapshots         | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.230686 UTC |   7: volume_delete.delete_crucible_snapshots         | succeeded  | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.233038 UTC |   8: volume_delete.delete_crucible_snapshot_records  | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.236251 UTC |   8: volume_delete.delete_crucible_snapshot_records  | succeeded  | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.239245 UTC |   9: volume_delete.find_freed_crucible_regions       | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.434046 UTC |   9: volume_delete.find_freed_crucible_regions       | succeeded  | {"datasets_and_regions":[],"volumes":[]}
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.444019 UTC |  10: volume_delete.delete_freed_crucible_regions     | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.447401 UTC |  10: volume_delete.delete_freed_crucible_regions     | succeeded  | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.450671 UTC |  11: volume_delete.hard_delete_volume_record         | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.475809 UTC |  11: volume_delete.hard_delete_volume_record         | succeeded  | true
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.479050 UTC |  12: subsaga end volume_delete_subsaga_no_result     | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.482171 UTC |  12: subsaga end volume_delete_subsaga_no_result     | succeeded  | true
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.485097 UTC |  13: region_replacement_finish.update_request_record | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.546421 UTC |  13: region_replacement_finish.update_request_record | succeeded  | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.549478 UTC |  15: end                                             | started    | 
dc873936-3647-4d72-a780-bd1677c3008b | 2025-08-18 18:50:05.552594 UTC |  15: end                                             | succeeded  | 

# /tmp/omdb-saga db sagas show 00bd8bcd-fcfc-4054-9acf-2e3e68799c73
 id                                   | time_created                   | name                     | state                 
--------------------------------------+--------------------------------+--------------------------+-----------------------
 00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.488857 UTC | region-replacement-drive | SagaCachedState(Done) 

                             saga id | event time                     | node id                                                        | event type | data
------------------------------------ | ------------------------------ | -------------------------------------------------------------- | ---------- | ---
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.492300 UTC |   8: start                                                     | started    | 
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.497921 UTC |   8: start                                                     | succeeded  | 
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.501146 UTC |   0: common.uuid_generate                                      | started    | 
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.504141 UTC |   0: common.uuid_generate                                      | succeeded  | "9d2c200d-fae9-46ea-8748-54bf19d6783a"
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.506871 UTC |   1: common.uuid_generate                                      | started    | 
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.510185 UTC |   1: common.uuid_generate                                      | succeeded  | "48cee09b-e137-4418-bfaa-a1dd5164456d"
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.513456 UTC |   2: region_replacement_drive.set_saga_id                      | started    | 
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.522448 UTC |   2: region_replacement_drive.set_saga_id                      | succeeded  | 
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.525429 UTC |   3: region_replacement_drive.drive_region_replacement_check   | started    | 
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.567947 UTC |   3: region_replacement_drive.drive_region_replacement_check   | succeeded  | "LastStepStillRunning"
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.571059 UTC |   4: region_replacement_drive.drive_region_replacement_prepare | started    | 
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.573560 UTC |   4: region_replacement_drive.drive_region_replacement_prepare | succeeded  | {"Noop":{"replacement_done":false}}
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.576529 UTC |   5: region_replacement_drive.drive_region_replacement_execute | started    | 
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.580855 UTC |   5: region_replacement_drive.drive_region_replacement_execute | succeeded  | {"replacement_done":false,"step_to_commit":null}
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.583793 UTC |   6: region_replacement_drive.drive_region_replacement_commit  | started    | 
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.586634 UTC |   6: region_replacement_drive.drive_region_replacement_commit  | succeeded  | 
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.588966 UTC |   7: region_replacement_drive.finish_saga                      | started    | 
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.598597 UTC |   7: region_replacement_drive.finish_saga                      | succeeded  | 
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.601617 UTC |   9: end                                                       | started    | 
00bd8bcd-fcfc-4054-9acf-2e3e68799c73 | 2025-08-18 18:48:54.604332 UTC |   9: end                                                       | succeeded  | 

Checking for outstanding replacements, we don't see any that appear to be associated with this disk:

# omdb db replacements-to-do
ID                                   DATASET_ID                           RESOURCE          EXISTING_REQUEST_TIME    EXISTING_REQUEST                                      
b4992f64-753b-4197-bb9a-f84758defbce 1f2d2f86-b69b-4130-bb9b-e62ba0cb6802 read-only region                                                                                 
1027673f-5094-4e1c-a496-49c42ee76ae5 4e1d2af1-8ef4-4762-aa80-b08da08b45bb read/write region 2025-06-11T20:30:12.934Z 104927a2-0f4c-4d75-a2cb-cf9d6076ceb6 (state Complete) 
2ba2b5b0-3c7e-4dcd-8ea6-29ee1dc87c01 2a796a69-b061-44c7-b2df-35bc611f10f5 read/write region 2025-06-11T20:30:13.465Z 95647a82-2c78-4fae-925c-7585eae842a7 (state Complete) 
aef4293f-252d-43cd-88bf-ba38279e1b2c eb779538-2b1b-4d1d-8c7e-b15f04db6e53 read/write region 2025-06-13T17:55:29.185Z 56311b56-e0de-4210-82cf-c7f8803ed111 (state Complete) 
16ffbad2-9999-4747-8745-d9b185a1c22a e8f55a5d-65f9-436c-bc25-1d1a7070e876 read/write region 2025-06-13T21:31:21.964Z 1034e8e5-c99a-4fa4-8b08-207b10f947fa (state Complete) 
4bb035e3-54ea-4392-8c0b-38d3a846c6b8 a109a902-6a27-41b6-a881-c353e28e5389 read/write region 2025-08-18T18:25:18.187Z c74afd70-7613-44c3-a878-db02115e8ff3 (state Complete) 

DATASET_ID                           REGION_ID                            SNAPSHOT_ID                          EXISTING_REQUEST_TIME    EXISTING_REQUEST                                        
2a796a69-b061-44c7-b2df-35bc611f10f5 2ba2b5b0-3c7e-4dcd-8ea6-29ee1dc87c01 e3407dfc-fd19-4948-8664-c620d881e880 2025-06-11T20:30:13.242Z 84f91e75-a83b-4a71-b22e-490b67fed7fb (state Allocating) 

Nor do we see any incomplete region replacements:

# omdb db region-replacement list --fetch-limit=5000 | grep -v Complete
Region replacement requests                                                     
ID                                   REQUEST_TIME             REPLACEMENT_STATE 
Expanded saga details

Saga DAG:

{
  "end_node": 9,
  "graph": {
    "edge_property": "directed",
    "edges": [
      [
        0,
        1,
        null
      ],
      [
        1,
        2,
        null
      ],
      [
        2,
        3,
        null
      ],
      [
        3,
        4,
        null
      ],
      [
        4,
        5,
        null
      ],
      [
        5,
        6,
        null
      ],
      [
        6,
        7,
        null
      ],
      [
        8,
        0,
        null
      ],
      [
        7,
        9,
        null
      ]
    ],
    "node_holes": [],
    "nodes": [
      {
        "Action": {
          "action_name": "common.uuid_generate",
          "label": "GenerateSagaId",
          "name": "saga_id"
        }
      },
      {
        "Action": {
          "action_name": "common.uuid_generate",
          "label": "GenerateJobId",
          "name": "job_id"
        }
      },
      {
        "Action": {
          "action_name": "region_replacement_drive.set_saga_id",
          "label": "SetSagaId",
          "name": "unused_1"
        }
      },
      {
        "Action": {
          "action_name": "region_replacement_drive.drive_region_replacement_check",
          "label": "DriveRegionReplacementCheck",
          "name": "check"
        }
      },
      {
        "Action": {
          "action_name": "region_replacement_drive.drive_region_replacement_prepare",
          "label": "DriveRegionReplacementPrepare",
          "name": "prepare"
        }
      },
      {
        "Action": {
          "action_name": "region_replacement_drive.drive_region_replacement_execute",
          "label": "DriveRegionReplacementExecute",
          "name": "execute"
        }
      },
      {
        "Action": {
          "action_name": "region_replacement_drive.drive_region_replacement_commit",
          "label": "DriveRegionReplacementCommit",
          "name": "commit"
        }
      },
      {
        "Action": {
          "action_name": "region_replacement_drive.finish_saga",
          "label": "FinishSaga",
          "name": "unused_2"
        }
      },
      {
        "Start": {
          "params": {
            "request": {
              "id": "4342f5e1-0690-4cac-bb30-f7b98a857856",
              "new_region_id": "4ed7e67a-7fa5-46eb-b312-67c647489808",
              "old_region_id": "54462a44-99ea-44e2-8743-ee4a369405e6",
              "old_region_volume_id": "be11ed6d-b780-492f-afda-2bd00a1e219b",
              "operating_saga_id": null,
              "replacement_state": "Running",
              "request_time": "2025-08-18T18:46:01.776498Z",
              "volume_id": "b5e92f40-390e-4d00-a9a4-b25709152250"
            },
            "serialized_authn": {
              "kind": {
                "Authenticated": [
                  {
                    "actor": {
                      "UserBuiltin": {
                        "user_builtin_id": "001de000-05e4-4000-8000-000000000002"
                      }
                    }
                  },
                  null
                ]
              }
            }
          }
        }
      },
      "End"
    ]
  },
  "saga_name": "region-replacement-drive",
  "start_node": 8
}

Saga finish DAG:

{
  "end_node": 15,
  "graph": {
    "edge_property": "directed",
    "edges": [
      [
        0,
        1,
        null
      ],
      [
        1,
        2,
        null
      ],
      [
        2,
        3,
        null
      ],
      [
        4,
        5,
        null
      ],
      [
        4,
        6,
        null
      ],
      [
        6,
        7,
        null
      ],
      [
        5,
        7,
        null
      ],
      [
        7,
        8,
        null
      ],
      [
        8,
        9,
        null
      ],
      [
        9,
        10,
        null
      ],
      [
        10,
        11,
        null
      ],
      [
        3,
        4,
        null
      ],
      [
        11,
        12,
        null
      ],
      [
        12,
        13,
        null
      ],
      [
        14,
        0,
        null
      ],
      [
        13,
        15,
        null
      ]
    ],
    "node_holes": [],
    "nodes": [
      {
        "Action": {
          "action_name": "common.uuid_generate",
          "label": "GenerateSagaId",
          "name": "saga_id"
        }
      },
      {
        "Action": {
          "action_name": "region_replacement_finish.set_saga_id",
          "label": "SetSagaId",
          "name": "unused_1"
        }
      },
      {
        "Constant": {
          "name": "params_for_volume_delete_subsaga",
          "value": {
            "serialized_authn": {
              "kind": {
                "Authenticated": [
                  {
                    "actor": {
                      "UserBuiltin": {
                        "user_builtin_id": "001de000-05e4-4000-8000-000000000002"
                      }
                    }
                  },
                  null
                ]
              }
            },
            "volume_id": "be11ed6d-b780-492f-afda-2bd00a1e219b"
          }
        }
      },
      {
        "SubsagaStart": {
          "params_node_name": "params_for_volume_delete_subsaga",
          "saga_name": "volume-delete"
        }
      },
      {
        "Action": {
          "action_name": "volume_delete.decrease_crucible_resource_count",
          "label": "DecreaseCrucibleResourceCount",
          "name": "crucible_resources_to_delete"
        }
      },
      {
        "Action": {
          "action_name": "volume_delete.delete_crucible_regions",
          "label": "DeleteCrucibleRegions",
          "name": "no_result_1"
        }
      },
      {
        "Action": {
          "action_name": "volume_delete.delete_crucible_running_snapshots",
          "label": "DeleteCrucibleRunningSnapshots",
          "name": "no_result_2"
        }
      },
      {
        "Action": {
          "action_name": "volume_delete.delete_crucible_snapshots",
          "label": "DeleteCrucibleSnapshots",
          "name": "no_result_3"
        }
      },
      {
        "Action": {
          "action_name": "volume_delete.delete_crucible_snapshot_records",
          "label": "DeleteCrucibleSnapshotRecords",
          "name": "no_result_4"
        }
      },
      {
        "Action": {
          "action_name": "volume_delete.find_freed_crucible_regions",
          "label": "FindFreedCrucibleRegions",
          "name": "freed_crucible_regions"
        }
      },
      {
        "Action": {
          "action_name": "volume_delete.delete_freed_crucible_regions",
          "label": "DeleteFreedCrucibleRegions",
          "name": "no_result_5"
        }
      },
      {
        "Action": {
          "action_name": "volume_delete.hard_delete_volume_record",
          "label": "HardDeleteVolumeRecord",
          "name": "volume_hard_deleted"
        }
      },
      {
        "SubsagaEnd": {
          "name": "volume_delete_subsaga_no_result"
        }
      },
      {
        "Action": {
          "action_name": "region_replacement_finish.update_request_record",
          "label": "UpdateRequestRecord",
          "name": "unused_2"
        }
      },
      {
        "Start": {
          "params": {
            "region_volume_id": "be11ed6d-b780-492f-afda-2bd00a1e219b",
            "request": {
              "id": "4342f5e1-0690-4cac-bb30-f7b98a857856",
              "new_region_id": "4ed7e67a-7fa5-46eb-b312-67c647489808",
              "old_region_id": "54462a44-99ea-44e2-8743-ee4a369405e6",
              "old_region_volume_id": "be11ed6d-b780-492f-afda-2bd00a1e219b",
              "operating_saga_id": null,
              "replacement_state": "ReplacementDone",
              "request_time": "2025-08-18T18:46:01.776498Z",
              "volume_id": "b5e92f40-390e-4d00-a9a4-b25709152250"
            },
            "serialized_authn": {
              "kind": {
                "Authenticated": [
                  {
                    "actor": {
                      "UserBuiltin": {
                        "user_builtin_id": "001de000-05e4-4000-8000-000000000002"
                      }
                    }
                  },
                  null
                ]
              }
            }
          }
        }
      },
      "End"
    ]
  },
  "saga_name": "region-replacement-finish",
  "start_node": 14
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions