Plan G: per-layer ZFS clone chain for enroot load docker:// (opt-in)#17
Plan G: per-layer ZFS clone chain for enroot load docker:// (opt-in)#17sodre wants to merge 11 commits intozenroot/mainfrom
Conversation
Signed-off-by: Patrick Sodré <patrick@zero-ae.com>
Signed-off-by: Patrick Sodré <patrick@zero-ae.com>
Signed-off-by: Patrick Sodré <patrick@zero-ae.com>
…ants Signed-off-by: Patrick Sodré <patrick@zero-ae.com>
Signed-off-by: Patrick Sodré <patrick@zero-ae.com>
…ll path Signed-off-by: Patrick Sodré <patrick@zero-ae.com>
Signed-off-by: Patrick Sodré <patrick@zero-ae.com>
Signed-off-by: Patrick Sodré <patrick@zero-ae.com>
Smoke testing on a 3-layer image (node:20-alpine) caught two bugs in the
chain installer:
1. Inverted iteration. docker::_download reverses the manifest's layer
order via jq's `reverse`, so digests[0] is the TOP layer and
digests[N-1] is the BASE. The original `for i in 0..N-1` loop treated
digests[0] as the base, building the chain upside-down and producing
a leaf that contained only the top-layer's diffs (e.g. 5.4M for what
should have been a 70M merged node:20-alpine rootfs). Iterating from
N-1 down to 0 puts BASE first in the zfs hierarchy and the TOP at
the leaf.
2. Missing synthetic config layer. docker::_prepare_layers populates a
directory 0/ via docker::configure with the per-image
/etc/{rc,fstab,environment} derived from the image config blob; Plan
F's overlay mount stacks 0:1:2:...:N so 0/ ends up on top. The chain
installer ignored 0/ entirely, so containers created via chain mode
were missing /etc/rc and the merged fstab entries. Now applied as a
final tar-pipe step on top of the leaf clone during template
finalization, before snapshotting @pristine.
Also tighten the apply payload:
- getfattr returns non-zero when no files match the requested xattr;
with set -euo pipefail in the payload that aborted the whole apply on
alpine (no opaque dirs). Capture to a temp file with `|| true`.
- Drop tar's --acls. Default ZFS datasets have acltype=off, which makes
POSIX ACL set/get fail with "Operation not supported" warnings even
when the source has no ACLs. Docker images effectively never depend
on ACLs, and xattrs (overlayfs opaque markers, capability bits,
SELinux labels) are still preserved.
Signed-off-by: Patrick Sodré <patrick@zero-ae.com>
Signed-off-by: Patrick Sodré <patrick@zero-ae.com>
|
Cross-image dedup verified on disk (acceptance criterion #1 from issue #4): Pulled Both chains branch from |
|
Two more acceptance criteria covered. Concurrent pull of the same image — race-safeENROOT_ZFS_LAYER_CHAIN=y enroot import -o /tmp/n1.sqsh docker://node:22-alpine3.21 & p1=$!
ENROOT_ZFS_LAYER_CHAIN=y enroot import -o /tmp/n2.sqsh docker://node:22-alpine3.21 & p2=$!
wait $p1 $p2Result: both processes returned 0; only one set of Lower-layer reuse on a second pull that shares a baseWhen This generalizes to the issue's "top-layer-only re-pull" case: when a docker tag is republished with only the top digest changed, every cached lower-layer |
Closes #4.
Adds an opt-in per-layer
zfs clonechain mode for the Docker template cache. WithENROOT_ZFS_LAYER_CHAIN=y, two images sharing a registry layer digest physically share the bytes on disk; re-pulling an image after a top-layer-only change reuses the cached lower-layer datasets.Layout
Each layer dataset is
zfs cloned from the previous layer's@done, with overlayfs whiteouts (mknod 0:0) and opaque-dir markers (trusted.overlay.opaque=y) replayed in shell on top of the cloned target — overlayfs only does that merge at mount time, but a chain stored at-rest needs it baked in. The chain leaf is then cloned into.templates/<config_sha>, the per-image synthetic0/config layer (rc/fstab/environment fromdocker::configure) is applied on top, and the result is snapshotted as@pristineso the existingzfs::clone_container, pointer-format, eviction-recovery, andzfs://paths all work unchanged.Why no
zfs promoteThe issue mentions promote as one option for flattening the chain. We don't promote — promoting inverts the chain (layers become clones of the template), which works for one image but produces a complex image-private topology that defeats the cross-image sharing goal. Plan G keeps layers as immutable origins; ZFS refuses to destroy a layer while any descendant clone exists, so layer GC is automatic once all referencing templates are evicted.
What's added
src/storage_zfs.shzfs::layer_chain_active,zfs::_apply_layer_payload,zfs::_build_layer,zfs::_install_layer_chain; chain-mode dispatch indocker_install_from_layersand_pull_and_install_template.src/docker.sh_prepare_layersside-emits the ordered layer-digest list to./.layersin its temp cwd;docker::load's ZFS branch reads it back when chain mode is active.pkg/deb/controlattr(providesgetfattr, required by chain-mode opaque-dir handling).doc/zfs.md,CLAUDE.mddoc/plans/2026-05-01-zfs-g-layer-chain.mdCoexistence with Plan F
ENROOT_ZFS_LAYER_CHAIN=(unset/empty/anything buty): Plan F's single-merge_install_template_from_layersruns unchanged.ENROOT_ZFS_LAYER_CHAIN=y: chain mode. Same dispatch hits bothdocker::load(direct create) and_pull_and_install_template(used by pointer-format import + eviction recovery).@pristinealready exists, reuse it" runs before the chain dispatch — templates produced under either mode are reused under the other without rebuild.Smoke results (spark-ctrl, Pi 5 / Debian 13 / OpenZFS 2.4.1, 3.75G test pool)
os-release+/etc/{rc,fstab,environment}/usr/local/bin/node(102M binary) present in container rootfszfs list(3070388042c61.03M USED / 19.3M REFER — pure dedup).wh.*AUFS files leak through, no char-device whiteouts in final rootfs.layers/namespace created,_install_template_from_layersruns as beforeSmoke testing also flagged two bugs that were fixed in 3f7e3af:
docker::_downloadreverses the manifest, sodigests[0]is the TOP, not the BASE).0/config layer apply on the leaf — Plan F's overlay mount stacks0:1:…:Nwith0/on top; the chain installer needed an explicit final tar-pipe of0/onto the template.Plus one packaging fix:
attris now Recommended (was Suggested), sincegetfattris required for chain-mode opaque-dir handling and Suggests is not auto-installed.🤖 Generated with Claude Code