Skip to content

ZFS template ENOSPC retry: tighten end-to-end verification #6

@sodre

Description

@sodre

Follow-up to #5 (Plan B). The ENOSPC retry path in zfs::ensure_template and zfs::docker_install_from_layers is structurally correct and partially verified, but never observed end-to-end in a single test run. This issue tracks closing that gap.

What's verified today

PR #5 documents the following tests passing on a loopback ZFS pool (Linux 6.12.75, aarch64, zfs-2.4.1):

  • unsquashfs returns non-zero on Disk quota exceeded (verified with quota=3M < single-template extract size).
  • The retry path catches the failure: [WARN] Extraction failed; evicting all warm templates and retrying is printed.
  • The sweep itself works under WARM_SECONDS=0 (Plan B Task 6 tests).
  • The retry attempt fires (a second unsquashfs invocation runs).
  • Final-error path fires when retry also fails: [ERROR] Extraction failed even after evicting warm templates, .tmp dataset is destroyed, no orphan container.

What's not yet verified

Success after retry, in one continuous run: first attempt hits ENOSPC → sweep evicts a warm template that frees enough space → retry attempt succeeds → final state has the new template installed.

The pieces are individually correct (the retry attempt is the identical command to the first), but observing the transition in a single run requires three quota constraints to line up:

  1. The new template's extracted size must be ≤ available-after-eviction.
  2. Two templates' combined extracted size must exceed the quota (so the first attempt fails).
  3. ZFS refuses quota=N if N < current usage, so the quota has to be set up while the warm template is fresh enough to fit AND tight enough that adding a second pushes over.

I tried several payload sizes and quotas on a 3.75G test pool and either both fit (no ENOSPC) or neither fit (no warm eviction recovery).

Also untested: Docker variant ENOSPC behavior

The .sqsh path uses unsquashfs, which surfaces ENOSPC immediately as a non-zero exit. The Docker path uses tar | tar inside enroot-nsenter and exhibited a different failure mode in one test: the receiving tar hung instead of returning a clean exit code, because ZFS quota visibility into the writing process is delayed by transaction-group commit timing. The retry's if ! guard never fired and the merge command had to be SIGKILL'd.

If this proves flaky in production, options include:

  • Wrap the receiving tar with a poller that watches the dataset's available property and aborts the pipe with SIGPIPE once it hits zero.
  • Pre-flight: run zfs::sweep_templates more aggressively before the merge if under_pressure is borderline (say, >= threshold - 10).
  • Add a hard timeout around the merge command and treat timeout-with-no-progress as ENOSPC.

Suggested verification approaches

  1. Dedicated tiny pool. Create a 64MB loopback file → 64MB pool. Tighter knobs available; smaller payloads exercise quotas with less ambiguity.
  2. Synthetic failure injection. Replace unsquashfs (and the merge command) with a wrapper script that exits non-zero on first call and exits zero on second call. Verifies the success-after-retry transition without depending on real ENOSPC behavior. Useful as a unit-style check.
  3. zfs reservation instead of quota. A reservation on a sibling dataset can squeeze the templates dataset's available bytes deterministically, sidestepping the "can't shrink quota below current usage" rule.
  4. Concurrent-extraction race against the quota — N workers each extracting a unique template against a quota that fits half of them. Forces sweep + retry under contention.

Acceptance criteria

  • A reproducible test recipe (in doc/zfs.md admin notes or a script in pkg/ if/when test infra lands) that triggers each of these in a single run:
    • First attempt ENOSPC → sweep → retry → success → new template installed.
    • First attempt ENOSPC → sweep evicts nothing → retry → second ENOSPC → final error → .tmp cleaned.
    • Docker tar | tar variant: receives clean ENOSPC exit code (no hang).
  • A short note in doc/zfs.md documenting the test recipe so admins can verify their own pool sizing produces the expected behavior.

Out of scope

  • Implementing a more sophisticated retry policy (e.g. multiple sweep aggressiveness levels, exponential backoff). The current single-retry behavior matches the plan; this issue is about verification, not redesign.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions