Background
PR that fixes the VM-keepalive regression (commit b329667) converted podman-machine-start.sh from a one-shot into a long-running supervisor that health-checks the transmission-vm + transmission-vpn container on a 300s cadence. The fix was shellchecked and reviewed but has no automated regression coverage for the supervision state machine.
What to build
A BATS test at tests/podman-machine-start.bats that:
- Extracts the wrapper heredoc from
app-setup/podman-transmission-setup.sh into a real file in $BATS_TMPDIR, rendering deploy-time placeholders (${HOMEBREW_PREFIX}, ${NFS_MOUNT_POINT}, ${HOST_PORT}, etc.) with test values. The tests/plex-watchdog.bats source_watchdog_functions() helper is a good pattern to mirror.
- Shims
podman in PATH to a scripted mock that responds to podman machine inspect, podman info, podman container exists, podman inspect, podman rm, podman run with fixtured values — and logs every invocation to $BATS_TMPDIR/podman.calls so the test can assert call order.
- Runs the wrapper with
SUPERVISE_INTERVAL=1 under timeout 5 (so the loop runs a couple of iterations then is killed), then asserts:
machine start was called exactly once when the initial state was stopped
machine start was not called on subsequent iterations when state is running
podman run -d ... transmission-vpn was called exactly once when the container was missing
podman run was not called again when the container existed + was running
- The script did not exit on its own (the exit was caused by
timeout)
Why it's worth it
The regression that prompted the original fix (vfkit dying 5 seconds after LaunchAgent exit) is a macOS launchd behavior we can't unit-test. But the mitigation — the supervisor loop and its state machine — has testable properties. A test that fails if someone accidentally converts this back to a one-shot, or breaks the idempotency of ensure_container, would catch the exact class of regression that caused the 2026-04-18 outage.
Why not now
The test is ~80 lines of BATS plus a podman mock. Unlike the config fix in (2), it's not a one-liner and doesn't block the current PR.
Pointers
- Existing BATS pattern:
tests/plex-watchdog.bats
- Script heredoc:
app-setup/podman-transmission-setup.sh:485 (<<WRAPPER)
- Supervisor loop: generated script's final
while true; do ... done block
Background
PR that fixes the VM-keepalive regression (commit
b329667) convertedpodman-machine-start.shfrom a one-shot into a long-running supervisor that health-checks the transmission-vm + transmission-vpn container on a 300s cadence. The fix was shellchecked and reviewed but has no automated regression coverage for the supervision state machine.What to build
A BATS test at
tests/podman-machine-start.batsthat:app-setup/podman-transmission-setup.shinto a real file in$BATS_TMPDIR, rendering deploy-time placeholders (${HOMEBREW_PREFIX},${NFS_MOUNT_POINT},${HOST_PORT}, etc.) with test values. Thetests/plex-watchdog.batssource_watchdog_functions()helper is a good pattern to mirror.podmaninPATHto a scripted mock that responds topodman machine inspect,podman info,podman container exists,podman inspect,podman rm,podman runwith fixtured values — and logs every invocation to$BATS_TMPDIR/podman.callsso the test can assert call order.SUPERVISE_INTERVAL=1undertimeout 5(so the loop runs a couple of iterations then is killed), then asserts:machine startwas called exactly once when the initial state wasstoppedmachine startwas not called on subsequent iterations when state isrunningpodman run -d ... transmission-vpnwas called exactly once when the container was missingpodman runwas not called again when the container existed + was runningtimeout)Why it's worth it
The regression that prompted the original fix (vfkit dying 5 seconds after LaunchAgent exit) is a macOS launchd behavior we can't unit-test. But the mitigation — the supervisor loop and its state machine — has testable properties. A test that fails if someone accidentally converts this back to a one-shot, or breaks the idempotency of
ensure_container, would catch the exact class of regression that caused the 2026-04-18 outage.Why not now
The test is ~80 lines of BATS plus a podman mock. Unlike the config fix in (2), it's not a one-liner and doesn't block the current PR.
Pointers
tests/plex-watchdog.batsapp-setup/podman-transmission-setup.sh:485(<<WRAPPER)while true; do ... doneblock