Skip to content

PoC: Move RAUC statusfile to persistent storage and check Status in release-validation post update#293

Draft
yfyf wants to merge 4 commits intodividat:mainfrom
yfyf:check-status-good
Draft

PoC: Move RAUC statusfile to persistent storage and check Status in release-validation post update#293
yfyf wants to merge 4 commits intodividat:mainfrom
yfyf:check-status-good

Conversation

@yfyf
Copy link
Copy Markdown
Collaborator

@yfyf yfyf commented Dec 12, 2025

This started out with b9143cc - after the select-display bug, I wanted to extend the tests to check that post-update the system reaches a Good state. When I tried that, I realized that does NOT happen in release-validation tests, since /boot/status.ini gets corrupted and statusfile-recovery fails. This is due FAT shenanigans to the unclean system_reset used in the test. See attached screenshots.

So this is actually 2 PRs, but I wanted to illustrate that the moving of RAUC to persistent storage kinda works already in the tests.

The actual implementation can be simplified and needs more tests/clean-up, but opening this draft so we can discuss it.

/boot/status.ini corruption in release-valdiation tests without 2414fd3 image(1) image(2)

yfyf added 4 commits December 12, 2025 17:18
…stems

QEMU's system_reset does an "unclean" reboot (power plug), which causes
corruption of the FAT filesystem in the test. There is no way to fix
this for existing systems and their DISK images, so instead we try to
work-around the issue by giving time for an unmount/fsync to happen.
Currently this check fails because /boot/status.ini gets corrupted after
the upgrade due to an unclean system reset in the tests (and maybe due
to running in a VM).
@yfyf yfyf added the reviewable Ready for initial or iterative review label Dec 12, 2025
@knuton
Copy link
Copy Markdown
Member

knuton commented Jan 23, 2026

@yfyf After #305, should we close this PR for now?

@knuton knuton added details needed Further information requested to better evaluate changes and removed reviewable Ready for initial or iterative review labels Jan 23, 2026
@yfyf
Copy link
Copy Markdown
Collaborator Author

yfyf commented Jan 26, 2026

@yfyf After #305, should we close this PR for now?

Let's keep this around for/when we consider the skeleton matrix and etc? If anything, it's a useful illustrative example of what kind of problematic interactions might happen between the running system and the skeleton.

I think it's also worth considering how to deal with this issue, the approach used here could be used as a safer alternative to only having the statusfile in /boot / FAT, provides extra disaster recovery. As discussed, it could also be a different approach (e.g. recreate status file from scratch), but some approach is needed.

@knuton knuton removed the details needed Further information requested to better evaluate changes label Jan 26, 2026
@yfyf
Copy link
Copy Markdown
Collaborator Author

yfyf commented Apr 17, 2026

Looking at this again with #329 about to land, I think this approach could work, especially since we pin down RAUC and deps. The benefit is that it safeguards us against FAT corruption during installs, which are bound to happen, particularly as we plan to have more standalone / at-home installations, where PlayOS is more likely to be powered on only during use.

The key idea is:

  • Storing RAUC status file in the persistent data volume (/var/lib/rauc), which uses a proper journaling FS (ext4)
  • Using /boot/status.ini only as a last-resort recovery storage (e.g. if persistent data is wiped or corrupted)

The tricky part is ensuring backwards compatibility with older versions. This PR handles that directly by "recovering" from /boot/status.ini if it has been updated by the other slot. This complicates the recovery logic. An alternative approach could be to do this change in stages:

  • Release a PlayOS version that only mirrors the RAUC status file to /var/lib/rauc/, but still uses /boot/ as primary source
  • Once all installations have progressed to write to /var/lib/rauc, we can switch to using it as the primary source and only recover if it is missing. This would simplify the recovery logic by a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants