Fix first run of elemental failing in Macbook podman#406
Fix first run of elemental failing in Macbook podman#406
Conversation
atanasdinov
left a comment
There was a problem hiding this comment.
Any idea if this is going to work without --privileged parameter?
pkg/repart/disk_repart.go
Outdated
| // the optional given flags. On success it parses systemd-repart output to get the generated partition UUIDs and update the | ||
| // given partitions list with them. | ||
| func runSystemdRepart(s *sys.System, target string, parts []Partition, flags ...string) error { | ||
| if os.Getenv("container") != "" && runtime.GOARCH == "arm64" { |
There was a problem hiding this comment.
Let's extract this snippet in a separate function, e.g. setupLoopDevices, and put a comment above it about what it's actually doing so it's clear when someone sees this 6 months from now.
a1ca269 to
ad1530b
Compare
ff16681 to
93d3598
Compare
|
|
davidcassany
left a comment
There was a problem hiding this comment.
If I understand it correctly I think we should not hardcode the number of loop devices, as it should be equal or higher than the partitions number.
| if os.Getenv("container") != "" && runtime.GOARCH == "arm64" { | ||
| for i := 0; i < 4; i++ { | ||
| devPath := fmt.Sprintf("/dev/loop%d", i) | ||
| _ = unix.Mknod(devPath, unix.S_IFBLK|0660, 7*256+i) |
There was a problem hiding this comment.
Out of curiosity, what happens if the device already exists? errors out or overwrites it? Probably it is less intrusive if we just create the missing ones.
There was a problem hiding this comment.
I believe it silently fails without any issue
| // This is not an issue in amd64 containers because enough loop device nodes are automatically available. | ||
| func setupLoopDeviceNodes() { | ||
| if os.Getenv("container") != "" && runtime.GOARCH == "arm64" { | ||
| for i := 0; i < 4; i++ { |
There was a problem hiding this comment.
Why 4? To my understanding this should be as big as the length of the partitions slice.
There was a problem hiding this comment.
4 is an arbitrary number that just allows it to work, based on the default settings (not sure how it works with further customization) I believe it only needs 1 device node
| // the optional given flags. On success it parses systemd-repart output to get the generated partition UUIDs and update the | ||
| // given partitions list with them. | ||
| func runSystemdRepart(s *sys.System, target string, parts []Partition, flags ...string) error { | ||
| setupLoopDeviceNodes() |
There was a problem hiding this comment.
I'd say we can simply pass the partitions number as an argument here.
There was a problem hiding this comment.
I don't think that's entirely necessary as I don't believe they are tied or linked to each other. But I will hijack this comment to outline a comprehensive view of the flow and the issue:
On a Mac, if we do podman machine ssh and then run ls -la /dev/loop* this is the output we see:
root@localhost:~# ls -la /dev/loop*
crw-rw----. 1 root disk 10, 237 Mar 31 23:28 /dev/loop-controlNow if we enter into a an elemental3 container that we built like so:
podman run -it --privileged \
--entrypoint /bin/sh \
-v $PWD/examples/elemental/customize/linux-only/:/config \
-v /run/podman/podman.sock:/var/run/docker.sock \
local/elemental-image:v3.0.0-alpha.20251212-g93d3598So that we have shell access, and then we run ls -la /dev/loop* this is what we see:
sh-5.3# ls -la /dev/loop*
crw-rw---- 1 root 6 10, 237 Apr 8 15:01 /dev/loop-controlSo by default, no loop device nodes are created inside of the podman virtual machine on Macs
so now if within that same container we run elemental3 --debug customize --type raw --local
We will fail with the known issue:
DEBU[0057] Running cmd: 'PATH=/sbin:/usr/sbin:/usr/bin:/bin systemd-repart --json=pretty --definitions=/tmp/elemental-repart.d370830029 --dry-run=no --empty=create --size=8192M /config/image-2026-04-08T15-03-28.raw'
DEBU[0062] "systemd-repart" command reported an error: exit status 1
DEBU[0062] "systemd-repart" command output:
DEBU[0062] "systemd-repart" stderr: No machine ID set, using randomized partition UUIDs.
Sized '/config/image-2026-04-08T15-03-28.raw' to 8G.
Applying changes to /config/image-2026-04-08T15-03-28.raw.
Failed to make loopback device of future partition 0: Device or resource busy After this failure, let's check the podman vm and container states:
sh-5.3# ls -la /dev/loop*
crw-rw---- 1 root 6 10, 237 Apr 8 15:01 /dev/loop-control
root@localhost:~# ls -la /dev/loop*
crw-rw----. 1 root disk 10, 237 Mar 31 23:28 /dev/loop-control
brw-rw----. 1 root disk 7, 0 Apr 8 11:04 /dev/loop0What's interesting here is /dev/loop0 is present within the podman vm, but not within the container. However, if we start a new container using the same command as before:
sh-5.3# ls -la /dev/loop*
crw-rw---- 1 root 6 10, 237 Apr 8 15:06 /dev/loop-control
brw-rw---- 1 root 6 7, 0 Apr 8 15:06 /dev/loop0We see that /dev/loop0 is present in the container and elemental will succeed and the /dev/loop0 will remain on the container:
DEBU[0075] systemd-repart output to parse:
[
{
...
},
{
...
}
]
INFO[0097] Customize complete
DEBU[0097] Cleaning up working directory
sh-5.3# ls -la /dev/loop*
crw-rw---- 1 root 6 10, 237 Apr 8 15:06 /dev/loop-control
brw-rw---- 1 root 6 7, 0 Apr 8 15:09 /dev/loop0But it is no longer present in the podman vm:
root@localhost:~# ls -la /dev/loop*
crw-rw----. 1 root disk 10, 237 Mar 31 23:28 /dev/loop-controlHowever, if you rerun elemental within that same container, it will continue to succeed indefinitely as, even though the loop device node is cleaned up from the podman vm, it remains on the current container
So I believe what's happening is:
- There are no available loop device nodes in the podman vm by default (at least I think this is intentional default behavior but i could be wrong)
- We run elemental in a container, and what happens is that container inherits the current state of the podman vm
- When elemental runs systemd-repart, the loop device node is created in the podman vm, however, the podman container does not automatically inherit this, so it fails
- On the second run, the new container inherits the existing loop device node and succeeds, as a result, it is cleaned up from the podman vm, so the next new container will fail as there is no loop device node to inherit
So the proposed solution bypasses the need for the loop device to already pre-exist on the podman vm when the container is first ran, and also the need for it to be repopulated each time for each new container
This is why I don't think the loop device nodes are tied to the partitions or anything specific, I chose 4 as an arbitrary number, but it could be more or less, I think even 1 should work, the only condition that I haven't checked is if there are instances where systemd-repart might need more than 1 loop device node
There was a problem hiding this comment.
I see, I had the impression a loop device was used in each partition. I guess we can leave it as is then.
When you run
elemental3throughpodmanfor the first time on a Macbook, the image fails with the following error:If you immediately run it a second time, the image build will succeed. If you run a 3rd time, it will fail, if you run it a 4th time it will succeed...
The issue is that
systemd-repartneeds multiple loopback devices during the partitioning process. During the first run, it does not have the 2 device nodes that it needs, but it seems like it leaves behind some artifact that allowssystemd-repartto succeed on the second run.My proposed solution is within the
runSystemdRepartfunction, we check if we are inside of anarm64container. If we are, we manually create some device nodes forsystemd-repartto use. This seems to work in my tests where alternating success/fails no longer happen and it succeeds each time.