Skip to content

build: declarative build.patches via git worktree isolation (FR-026)#179

Open
barbatos2011 wants to merge 6 commits into
tronprotocol:developfrom
barbatos2011:feat/build-patches
Open

build: declarative build.patches via git worktree isolation (FR-026)#179
barbatos2011 wants to merge 6 commits into
tronprotocol:developfrom
barbatos2011:feat/build-patches

Conversation

@barbatos2011
Copy link
Copy Markdown

Summary

Adds build.patches: []string to intent — a list of unified-diff files trond applies to a fresh git worktree of the resolved revision before running gradle. Closes FR-026/027/028 added in this PR's spec.md update.

Motivating use case: tools/replay/ (PR #172) needs 4 java-tron source patches to skip mainnet-time-baked checks. Today the operator runs git apply manually before trond build. This PR makes the patches a declarative property of the intent so the build pipeline handles them.

Beyond ergonomics:

  • Cross-machine cache reuse. Today's working-tree-derived PatchHash depends on the operator's local .DS_Store / IDE temp files. Two operators with the same intent get different cache keys. With declarative patches, patch_hash = sha256(ordered patch contents) — pure function of inputs, identical on every machine.
  • Reproducibility. Intent becomes the single source of truth for what artifact a deploy needs.
  • Agent-friendly. An MCP agent can drive a patched-build deploy without shelling out to git apply.
  • User source preserved. The worktree is git worktree add --detach under $TROND_STATE_DIR/builds/worktrees/<cache-key>/. The user's primary checkout's git status stays clean — IDE windows don't churn, in-flight WIP is safe.

Lifecycle

resolveBuild (cache miss) →
  setupWorktree (mkdir, git worktree add, git apply --check, git apply)
  → runner.RunBuild (gradle in worktree path)
  → defer cleanup (git worktree remove --force)

Cache hits skip setup/cleanup entirely. Failed patches surface as PATCH_FAILED with the offending patch path and git apply's stderr.

Behavior when build.patches is absent/empty: unchanged. The build runs in the user's source dir directly with today's dirty-tree PatchHash semantics — preserves the iterative dev-loop where you edit source, run trond apply, get the artifact with your edits.

Schema / contract changes

  • intent.BuildSpec gains optional patches []string (FR-026)
  • Manifest.patches (additive optional field, omitempty)
  • schemas/output/build-{inspect,list,prune}.schema.json declare the new field
  • SchemaVersion bump 1.4.0 → 1.5.0 (MINOR — additive only)

Patches shipped

tools/replay/patches/{01..04}-*.patch — the 4 java-tron source patches replay needs:

  • 01-skip-tx-expiration.patchTransactionCapsule.checkExpiration → no-op
  • 02-skip-tapos-validation.patchManager.processTransaction skip validateTapos + validateCommon
  • 03-skip-network-expiration.patchTransactionsMsgHandler skip network-layer expiration + its catch arm
  • 04-fast-proposals.patchProposalCapsule.hasExpired → always true (optional, for faster proposal iteration)

All 4 verified to apply cleanly to current java-tron HEAD (d7a90ca21) via the worktree mechanism.

Test plan

  • make test — clean
  • make lint — clean
  • go test ./internal/build/ -run 'TestComputePatchHash|TestValidatePatchFile|TestSetupWorktree|TestResolveBuild_Patches' — 16 sub-cases passing:
    • TestComputePatchHash_Deterministic (4 sub) — cross-machine hash determinism; order matters; identical contents hash same regardless of filename; empty list
    • TestValidatePatchFile (7 sub) — header sniffing accepts git/posix/svn formats, rejects yaml/binary/garbage/missing
    • TestSetupWorktree_AppliesPatchesAndCleansUp — full lifecycle with patch application + cleanup
    • TestSetupWorktree_BadPatchSurfacesStructuredErrorPATCH_FAILED envelope with patch path + git stderr
    • TestResolveBuild_PatchesOverridesPatchHash — unrelated dirty-tree noise doesn't shift the cache key when Patches is set (the cross-machine cache reuse invariant)
  • Real trond plan against an intent with 2 patches + java-tron source — plan resolves cleanly
  • All 4 replay patches verified via git apply --check against fresh java-tron worktree

Adds FR-026/027/028: intent's build: block accepts a `patches: []string`
list of unified-diff files. When non-empty, trond builds the source in
an isolated `git worktree` at the resolved revision with patches
applied, NOT in the user's primary source tree.

Use case (concrete): `replay` (tools/replay/) requires 4 java-tron
source patches to skip mainnet-time-baked checks (expiration, TaPos,
network-layer check). Previously these were manual `git apply` steps
the user had to run before trond build. Now they're declared once in
the intent and trond handles isolation + cache.

Why this is more than ergonomics:

  - Cross-machine cache reuse. Today's working-tree-derived PatchHash
    is sha256(`git diff` + `git status -uall`), which depends on the
    operator's local .DS_Store / IDE temp files / WIP edits. Two
    operators with the same intent get DIFFERENT cache keys. With
    declarative patches, PatchHash = sha256(canonical concat of patch
    contents in order) — a pure function of inputs. Same intent +
    same patches → same cache_key on every machine.
  - Reproducibility. Intent becomes the single source of truth for
    what artifact a deploy needs.
  - Agent-friendliness. An MCP agent can drive a patched-build deploy
    without ever shelling out to git apply (which isn't on trond's
    allowlist anyway).
  - User source preserved. The worktree is `git worktree add --detach`
    at $TROND_STATE_DIR/builds/worktrees/<cache-key>/; the user's
    primary checkout's `git status` stays untouched. IDE windows don't
    churn, in-flight WIP is safe.

Lifecycle:

  resolveBuild (cache miss) →
    setupWorktree (mkdir, git worktree add, git apply --check, git apply)
    → runner.RunBuild (gradle in worktree path)
    → defer cleanup (git worktree remove --force)

Cache hits skip setup/cleanup entirely — the JAR is already in out/.
Failed patches surface as PATCH_FAILED with the offending patch path
and `git apply`'s stderr.

Behavior when build.patches is absent/empty: unchanged. The build
runs in the user's source dir directly with today's dirty-tree
PatchHash semantics — preserves the iterative dev-loop where you
edit source, run trond apply, get the artifact with your edits.

Schema changes:

  - intent.BuildSpec gains optional `patches []string` (FR-026).
  - build.Request gains `Patches []string`.
  - Manifest gains `patches []string` for inspect/list traceability.
  - schemas/output/build-{inspect,list,prune}.schema.json declare
    the patches field (optional / omitempty).
  - SchemaVersion 1.4.0 → 1.5.0 (MINOR — additive).

Patches shipped: tools/replay/patches/{01..04}-*.patch covering
TransactionCapsule.checkExpiration, Manager validateTapos/Common,
TransactionsMsgHandler network-layer check, ProposalCapsule.hasExpired.
All 4 verified to apply cleanly against current java-tron HEAD
(d7a90ca21) via the worktree mechanism.

Tests (16 sub-cases, all passing):

  - TestComputePatchHash_Deterministic (4 sub) — cross-machine hash
    determinism; order matters; identical contents hash same
    regardless of filename
  - TestValidatePatchFile (7 sub) — header sniffing accepts git/posix/
    svn formats, rejects yaml/binary/garbage
  - TestSetupWorktree_AppliesPatchesAndCleansUp — full worktree
    lifecycle with patch application + cleanup
  - TestSetupWorktree_BadPatchSurfacesStructuredError — PATCH_FAILED
    envelope with patch path + git stderr
  - TestResolveBuild_PatchesOverridesPatchHash — confirms unrelated
    dirty-tree noise doesn't shift the cache key when Patches is set

Documentation:
  - specs/002-trond-build-pipeline/spec.md FR-026/027/028 + key entities
  - specs/002-trond-build-pipeline/quickstart.md: full "Patching source
    before build" section with declarative + manual paths
  - AGENTS.md Workflow 5 gains a Building-with-patches subsection
  - tools/replay/README.md recommends the declarative path; manual
    git apply preserved as alternative
HIGH tronprotocol#1: validatePatchFile rejected git format-patch files
  Long commit messages push `diff --git` past the prior 4 KB sniff
  window. Now sniffs 64 KB AND recognizes the `From <40-hex-sha>`
  mbox prologue that `git format-patch` writes at byte 0. Test
  case added: git format-patch with a 6 KB body before the diff.

HIGH tronprotocol#2: Manifest.Patches stored absolute filesystem paths
  Made the field stale when patches were moved/renamed/pulled from
  a shared TROND_STATE_DIR — undermined the cross-machine cache
  reuse story for `trond build inspect`. Replaced with
  []PatchRecord = {Name (basename), SHA256 (content hash)}. Schema
  updated. The cache-key-deriving PatchHash now flows through the
  same records (computePatchHashFromRecords), so the per-patch
  sha256 is the single fingerprint feeding both cache key + inspect
  traceability. One pass over each patch file; no double-reads.

MED tronprotocol#3: No regression test for "second call hits cache"
  Added TestResolveBuild_PatchesProducesStableCacheKey: two
  separate copies of the same patch contents at unrelated
  filesystem paths → identical cache_key. Pins the headline
  cross-machine cache reuse benefit.

MED tronprotocol#5: Manifest field comment was inaccurate
  Original comment said "absolute paths declared in
  intent.build.patches" — but the apply layer resolved them to
  absolute before recording. Comment now describes the new
  PatchRecord shape correctly.

Refactor: dropped computePatchHash([]string), replaced with
buildPatchRecords + computePatchHashFromRecords pair. resolved
struct gains a patchRecords field so all three Manifest creation
sites (builder.go, image.go, image_wrap.go) populate from the
same canonical list.

Tests added (10 new sub-cases):
  - TestComputePatchHash_Deterministic adapted (4 sub) for the
    records-based API
  - TestBuildPatchRecords (1) — Name/SHA256 shape, basename only
  - TestValidatePatchFile gained 2: git format-patch with long
    body (must accept); `From <not-a-sha>` (must reject)
  - TestResolveBuild_PatchesProducesStableCacheKey (1)

Schemas updated: build-list / build-inspect / build-prune patches
field changed from string[] to {name, sha256}[]. Baseline
regenerated; SchemaVersion stays 1.5.0 (the PR's own bump — this
review pass amends before merge).
LOW tronprotocol#1: TestResolveBuild_PatchesProducesStableCacheKey was one-sided
  Asserted keyA == keyB (same patches different paths) but didn't
  prove patches actually participate in the key — would have passed
  trivially if a refactor silently dropped them. Added the
  `keyA != keyWithoutPatches` assertion. Combined with the sibling
  PatchHash test, the patches→cache-key invariant is now pinned
  from both directions.

LOW tronprotocol#2: No JSON round-trip test for Manifest.Patches []PatchRecord
  Added TestManifest_PatchRecordRoundTrip + TestManifest_NilPatchesOmitted
  in a new manifest_test.go. Pins:
  - The wire field names (`patches`, `name`, `sha256`) so a tag
    typo can't silently drift Manifest away from
    schemas/output/build-*.schema.json
  - Marshal → Unmarshal → DeepEqual round-trip for a 2-record list
  - omitempty behavior on nil Patches (no `"patches": null` or
    `"patches": []` noise emitted for the no-patches case)

LOW tronprotocol#3 (backwards-compat): no fix — pre-merge.
  Old cache entries from the in-development 52abc0f shape would
  fail to deserialize into the new []PatchRecord shape. Acceptable
  since PR tronprotocol#179 hasn't shipped; failed-load entries just trigger
  a rebuild via Lookup's standard cache-miss path.
Review pass 6 changed Manifest.Patches from []string (absolute paths)
to []PatchRecord{Name, SHA256}, and computePatchHash from "sha256 of
concatenated patch contents" to "sha256 of per-patch SHA256s folded
in declared order". The functional invariant (cross-machine
deterministic patch_hash) is preserved, but the docs still described
the old formula and didn't mention the new PatchRecord shape that
`trond build inspect` now surfaces.

Three docs updated:

spec.md
  - FR-026 patch_hash formula reworded to describe the per-patch-
    sha256 fold; clarified that the exact byte format is an impl
    detail and the operator-facing invariant is "same patch CONTENTS
    → same cache_key on every machine".
  - Key Entities gained a new PatchRecord entry describing what
    Manifest.Patches now carries.
  - Build entity's patches property updated to "list of
    {name, sha256} records" (was "list of paths").

quickstart.md
  - Step 2 of the patches walkthrough now describes the per-patch
    record flow and explains why the manifest stores portable
    fingerprints (basename + sha256) rather than absolute paths.
  - Inspect section gained a worked example showing the patches
    array in JSON output, and a tip on how a teammate can verify
    their local patches match a cached artifact (compare sha256s).

AGENTS.md
  - Workflow 5 "Building with patches" section: same patch_hash
    reword; added a bullet describing the manifest's patches
    array shape so agents driving `build_inspect` know the field
    is `[{name, sha256}]` not absolute paths.

No code changes. `go test -tags e2e ./cmd -run TestE2E_AgentsMD`
passes — every command shown in AGENTS.md still resolves.
…tocol#179

HIGH H1: no test for jar-wrap + patches recursion
  Impl was correct (jarReqForWrap copies req whole so Patches flows
  into the inner JAR build), but a future refactor could silently
  drop Patches alongside the cleared ImageTag/ImageStrategy and
  produce a wrapped image with un-patched code — the worst
  regression class because builds succeed and images run. Added
  TestJarReqForWrap/patches_flow_through_to_inner_JAR_build.

HIGH H2: orphan worktree dirs leaked forever
  setupWorktree creates <state>/builds/worktrees/<key>/ but
  SIGKILL between defer-registration and execution leaves an
  orphan that prune never reclaimed. 100-200 MB per leak.
  Fixed by:
  - EnsureCacheDirs pre-creates worktrees/ alongside the existing
    out/images/manifest/locks dirs
  - Prune gained a final gcOrphanWorktrees pass: walks
    worktrees/, drops dirs whose cache key has no live manifest
    AND no active flock. The flock check prevents ripping a
    worktree out from under a concurrent in-flight build.
  - Removed worktrees show up in result.Removed with
    artifact_kind="worktree" + orphaned=true so JSON consumers
    can distinguish from manifest removals.
  - Two new tests: TestPrune_GCsOrphanWorktrees,
    TestPrune_SkipsLockedOrphanWorktree.

MED M1: validatePatchFile 64 KB sniff too small
  Release-note commit bodies + signed-off-by chains in
  git format-patch can push `diff --git` past 64 KB. Bumped to
  256 KB. Error message + comment updated accordingly.

MED M4: quickstart told users to `git checkout -- .` after `git diff`
  Destructive — wipes local work. Replaced with a callout warning
  + correct guidance: the diff file is enough, leave the working
  tree alone OR use `git stash` (recoverable) instead of
  `git checkout` (destructive).

MED M5: no test for patches against a pinned non-HEAD revision
  Added TestSetupWorktree_AgainstNonHeadRevision: repo with two
  commits, patch matches FIRST commit content, HEAD drifted to
  different content. Worktree pinned to first revision; apply
  must succeed. Pins FR-026's "checks out the resolved revision"
  semantics.

MED M6: no test for binary-file patches
  Added TestSetupWorktree_BinaryPatch: generates a real
  `git diff --binary` patch via the test repo, validates it
  through validatePatchFile (must accept the `diff --git`
  header), then setupWorktree → assert worktree's binary content
  matches the post-patch bytes.

LOW L3: isGitFormatPatchHeader accepted only lowercase hex
  git format-patch always emits lowercase, but a downstream
  signer / cosign attest pipeline may normalize to uppercase.
  Defensive: now accepts [0-9a-fA-F].

Spec note: trust model of build.patches path access
  validatePatchFile reads any file the trond user can read for
  its 256 KB sniff + content hashing. Same trust level as
  build.source / build.gradle_args / build.env. FR-028 gained
  an explicit trust-model paragraph stating: intent files are
  trusted input; trond does not defend against a malicious
  intent author.

Test count change: +6 new sub-cases (H1 + H2 ×2 + M5 + M6,
TestValidatePatchFile gained no new cases — the 256 KB bump is
backward-compatible with the existing 6 KB fixture).
HIGH H-1 (BLOCKER): orphan worktree GC produced schema-invalid Entries
  Review pass 8's gcOrphanWorktrees synthesized Entry records with
  ArtifactKind="worktree" + zero CreatedAt and appended them to
  result.Removed. build-prune.schema.json requires created_at AND
  constrains artifact_kind to enum [jar, image] — so any actual
  worktree reclaim emitted JSON that failed validation against the
  PR's own schema. Caught by holistic re-review.

  Fix: introduce a distinct surface for orphan worktrees rather
  than overloading Entry.
    - New type OrphanWorktree { CacheKey, SizeBytes } in cache_ops.go
    - New PruneResult.OrphanWorktrees []OrphanWorktree (omitempty)
    - Schema declares the new field + a sibling $defs/orphan_worktree
    - Test asserts on result.OrphanWorktrees (not result.Removed)
  Entry's invariant ("always a Manifest-tied build cache entry") is
  preserved; agents that only consume Removed see only kind=jar|image.

MED M-1: dry-run skipped orphan worktree discovery
  Operators using --dry-run to size their cleanup got a number
  smaller than the real --confirm run because gcOrphanWorktrees
  ran ONLY in the non-dry-run branch. Now Prune calls
  gcOrphanWorktrees(..., dryRun) in both branches; the GC function
  inspects + records on dry-run (populates OrphanWorktrees +
  credits FreedBytes) and inspects + removes on confirm. New test
  TestPrune_DryRunSurfacesOrphanWorktrees pins the projection,
  including verifying the worktree dir is STILL on disk after a
  dry-run.

MED M-3: stale validatePatchFile docstring
  Doc-comment still said "64 KB sniff window" after the 256 KB
  bump landed in pass 8. The inline comment + error message were
  updated then; the docstring missed. Now consistent.

LOW L-2: bookkeeping leak hint
  gcOrphanWorktrees does os.RemoveAll on the worktree dir (we
  don't know the source repo path to run proper `git worktree
  remove`). This leaves stale .git/worktrees/<key>/ ledger
  entries on the user's source repo. Now, after the GC pass
  reclaims >=1 orphan, we emit a stderr "info:" line telling
  the operator to `git worktree prune` in their checkout. Cheap
  + actionable + non-blocking.

Skipped findings:
  - M-2 (dirSizeBytes walks full tree): acceptable. Walking
    50k files of a populated java-tron worktree is <1s on a
    modern SSD; NFS-pessimistic estimates can be re-litigated if
    a real operator complains.
  - L-1 (naming overload, four "patches" types): subjective.
  - L-3 (example path assumes sibling repo): correct as-is for
    the documented use case (replay tool's patches live in
    tron-deployment alongside their consumer).

Schema baseline regenerated; SchemaVersion stays 1.5.0 (PR's own
bump, additive optional field within the unreleased version).
3 new test scenarios pass; `make test` + `make lint` clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant