Skip to content

feat: seed mirror with commit-batched pushes for large upstreams#45

Merged
cabljac merged 11 commits into
mainfrom
feat/chunked-seed-push
May 18, 2026
Merged

feat: seed mirror with commit-batched pushes for large upstreams#45
cabljac merged 11 commits into
mainfrom
feat/chunked-seed-push

Conversation

@cabljac
Copy link
Copy Markdown
Owner

@cabljac cabljac commented May 15, 2026

Closes #44

Problem

venfork setup seeds a new private mirror with a single git push <default>:<default> of full upstream history. For large upstreams GitHub 408s — it enforces an HTTP request-duration limit and the monolithic pack POST exceeds it:

error: RPC failed; HTTP 408 curl 22 The requested URL returned error: 408
send-pack: unexpected disconnect while reading sideband packet
fatal: the remote end hung up unexpectedly

Reproduced seeding genkit-ai/genkit (~203 MB pack) into a private mirror over gh-authenticated HTTPS (0.9.0, with the SSH-hang fix from #43). Distinct from the auth hang and not a rate limit — a request-duration timeout.

The Source Imports REST API (which could offload this server-side) was deprecated and returns an error since 2024-04-12, so venfork must transfer the pack itself.

Change

New seedMirrorInChunks(tempDir, httpsUrl, branch):

  • git rev-list --reverse <branch> → push in batches of VENFORK_SEED_CHUNK commits (default 1000), --force --no-thin, each to refs/heads/<branch>; then push the real tip.
  • Each POST stays under GitHub's timeout. Full history / identical SHAs to upstream preserved — required for sync/stage/--pr.
  • Per-push retry (4 attempts, 8s backoff) to ride out transient 408s.
  • Small repos (commits <= chunk) take a single push — no regression.
  • Pushes carry http.postBuffer=512MB + http.version=HTTP/1.1 + low-speed abort, alongside the existing gh credential helper and netExec timeout / non-interactive env.

Replaces the single seed push in the setup flow.

Validated

This exact strategy was used manually to successfully seed invertase/genkit-typescript-sandbox from genkit (5063 commits, 5 chunks + tip) after the single push 408'd.

Tests

  • bunx tsc --noEmit clean, bun run check clean, bun test → 289 pass / 0 fail.
  • Updated the HTTPS-seed test for the new :refs/heads/main refspec.
  • New test: 5 commits + VENFORK_SEED_CHUNK=2 → exactly 3 force pushes (commit 2, commit 4, real tip).

cabljac added 2 commits May 15, 2026 16:25
A single `git push` of a large upstream's full history fails: GitHub
enforces an HTTP request-duration limit and the monolithic pack POST
exceeds it (`HTTP 408`, `the remote end hung up unexpectedly`).
Reproduced seeding genkit-ai/genkit (~203 MB) into a private mirror.

The Source Imports REST API (which could offload this server-side) was
deprecated and returns an error since 2024-04-12, so venfork must
transfer the pack itself.

Seed the default branch in commit batches instead: walk
`git rev-list --reverse <branch>` and push in batches of
VENFORK_SEED_CHUNK commits (default 1000, force, --no-thin) so each
POST stays under GitHub's timeout, then push the real tip. History and
SHAs stay identical to upstream (required for sync/stage/PRs). Each
push retries a few times to ride out transient 408s. Small repos still
take a single push (loop body skipped) — no regression.

Pushes carry http.postBuffer + HTTP/1.1 + low-speed abort alongside the
existing gh credential helper and netExec timeout/non-interactive env.

Closes #44
Review follow-ups on the chunked seed push:

- VENFORK_SEED_CHUNK was used unvalidated: `0`/negative caused an
  infinite loop (pushing `commits[-1]` forever) and non-numeric values
  silently disabled chunking. Clamp to a positive integer, falling back
  to 1000 otherwise.
- pushRef retried every error including runNetOp's hard-timeout
  GitError, so a genuine timeout could block for up to 4x
  GIT_NET_TIMEOUT. Only retry transient push failures (e.g. HTTP 408);
  rethrow GitError immediately.

Adds a test asserting an invalid chunk value cannot infinite-loop.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds chunked private mirror seeding to make venfork setup more reliable for large upstream repositories that can time out during a single full-history push.

Changes:

  • Introduces seedMirrorInChunks() to push default-branch history in configurable commit batches, then push the final branch tip.
  • Adds seed-push Git config for gh HTTPS credentials, HTTP behavior, low-speed aborts, and retry handling.
  • Updates setup tests to expect fully qualified destination refs and adds coverage for chunk sizing fallback.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/commands.ts Replaces the single setup seed push with commit-batched force pushes and retry logic.
tests/commands.test.ts Updates seed-push expectations and adds tests for chunked seeding and invalid chunk values.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/commands.ts Outdated
Comment thread src/commands.ts Outdated
cabljac added 2 commits May 15, 2026 16:56
Address Copilot review on #45:

- Retry only recognized transient push failures (HTTP 408, 5xx/gateway,
  dropped connection, "remote end hung up", RPC failed, ...). Permanent
  errors (bad credentials, missing access, repo not found, protected
  branch) and any unrecognized failure now fail fast instead of waiting
  through three backoffs. New isTransientPushError() does the
  classification from the push stderr/stdout.
- Reword the VENFORK_SEED_CHUNK comment: an invalid/NaN value falls
  back to the default chunk size (still chunks >1000-commit repos); it
  does not disable chunking.
- Retry backoff is now VENFORK_SEED_RETRY_MS (default 8000) so the
  retry path is testable without real delays.

Tests: permanent error -> single push (no retry); transient error ->
exactly 4 attempts.
Fresh-review follow-up: backoffMs lacked the NaN/negative guard that
chunk has. A non-numeric VENFORK_SEED_RETRY_MS made setTimeout receive
NaN (treated as 0), turning retries into a backoff-less storm. Clamp to
the 8000ms default for negative/NaN values, matching the chunk guard.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread src/commands.ts Outdated
Comment thread src/commands.ts Outdated
Copilot AI and others added 2 commits May 18, 2026 13:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread src/commands.ts
Comment thread src/commands.ts
@cabljac
Copy link
Copy Markdown
Owner Author

cabljac commented May 18, 2026

@copilot apply changes based on the comments in this thread

Copilot AI and others added 4 commits May 18, 2026 13:54
Agent-Logs-Url: https://github.com/cabljac/venfork/sessions/c40592d8-164c-4312-912a-4ffe81dc24e5

Co-authored-by: cabljac <32874567+cabljac@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented May 18, 2026

@copilot apply changes based on the comments in this thread

Applied in 34da799 (plus 56ada5c/55fe618): chunked seed tips now come from git rev-list --first-parent --reverse for monotonic ancestry across non-linear history, and tests now cover the streamed stderr/no-buffered-output retry path with a deferred stream-emission mock.

Number('') === 0, so an empty env value passed the finite/>=0 guard
and silently disabled retry backoff. Now empty/whitespace-only is
treated as unset (default 8000ms); explicit '0' stays valid.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cabljac cabljac merged commit 236d9be into main May 18, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Seeding large upstreams fails: single push 408s (chunk the seed push)

3 participants