feat: seed mirror with commit-batched pushes for large upstreams#45
Conversation
A single `git push` of a large upstream's full history fails: GitHub enforces an HTTP request-duration limit and the monolithic pack POST exceeds it (`HTTP 408`, `the remote end hung up unexpectedly`). Reproduced seeding genkit-ai/genkit (~203 MB) into a private mirror. The Source Imports REST API (which could offload this server-side) was deprecated and returns an error since 2024-04-12, so venfork must transfer the pack itself. Seed the default branch in commit batches instead: walk `git rev-list --reverse <branch>` and push in batches of VENFORK_SEED_CHUNK commits (default 1000, force, --no-thin) so each POST stays under GitHub's timeout, then push the real tip. History and SHAs stay identical to upstream (required for sync/stage/PRs). Each push retries a few times to ride out transient 408s. Small repos still take a single push (loop body skipped) — no regression. Pushes carry http.postBuffer + HTTP/1.1 + low-speed abort alongside the existing gh credential helper and netExec timeout/non-interactive env. Closes #44
Review follow-ups on the chunked seed push: - VENFORK_SEED_CHUNK was used unvalidated: `0`/negative caused an infinite loop (pushing `commits[-1]` forever) and non-numeric values silently disabled chunking. Clamp to a positive integer, falling back to 1000 otherwise. - pushRef retried every error including runNetOp's hard-timeout GitError, so a genuine timeout could block for up to 4x GIT_NET_TIMEOUT. Only retry transient push failures (e.g. HTTP 408); rethrow GitError immediately. Adds a test asserting an invalid chunk value cannot infinite-loop.
There was a problem hiding this comment.
Pull request overview
Adds chunked private mirror seeding to make venfork setup more reliable for large upstream repositories that can time out during a single full-history push.
Changes:
- Introduces
seedMirrorInChunks()to push default-branch history in configurable commit batches, then push the final branch tip. - Adds seed-push Git config for gh HTTPS credentials, HTTP behavior, low-speed aborts, and retry handling.
- Updates setup tests to expect fully qualified destination refs and adds coverage for chunk sizing fallback.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/commands.ts |
Replaces the single setup seed push with commit-batched force pushes and retry logic. |
tests/commands.test.ts |
Updates seed-push expectations and adds tests for chunked seeding and invalid chunk values. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address Copilot review on #45: - Retry only recognized transient push failures (HTTP 408, 5xx/gateway, dropped connection, "remote end hung up", RPC failed, ...). Permanent errors (bad credentials, missing access, repo not found, protected branch) and any unrecognized failure now fail fast instead of waiting through three backoffs. New isTransientPushError() does the classification from the push stderr/stdout. - Reword the VENFORK_SEED_CHUNK comment: an invalid/NaN value falls back to the default chunk size (still chunks >1000-commit repos); it does not disable chunking. - Retry backoff is now VENFORK_SEED_RETRY_MS (default 8000) so the retry path is testable without real delays. Tests: permanent error -> single push (no retry); transient error -> exactly 4 attempts.
Fresh-review follow-up: backoffMs lacked the NaN/negative guard that chunk has. A non-numeric VENFORK_SEED_RETRY_MS made setTimeout receive NaN (treated as 0), turning retries into a backoff-less storm. Clamp to the 8000ms default for negative/NaN values, matching the chunk guard.
Agent-Logs-Url: https://github.com/cabljac/venfork/sessions/226b745f-115b-4480-b2a8-f94f1f6347b0 Co-authored-by: cabljac <32874567+cabljac@users.noreply.github.com>
Agent-Logs-Url: https://github.com/cabljac/venfork/sessions/226b745f-115b-4480-b2a8-f94f1f6347b0 Co-authored-by: cabljac <32874567+cabljac@users.noreply.github.com>
|
@copilot apply changes based on the comments in this thread |
Agent-Logs-Url: https://github.com/cabljac/venfork/sessions/c40592d8-164c-4312-912a-4ffe81dc24e5 Co-authored-by: cabljac <32874567+cabljac@users.noreply.github.com>
Agent-Logs-Url: https://github.com/cabljac/venfork/sessions/c40592d8-164c-4312-912a-4ffe81dc24e5 Co-authored-by: cabljac <32874567+cabljac@users.noreply.github.com>
Agent-Logs-Url: https://github.com/cabljac/venfork/sessions/c40592d8-164c-4312-912a-4ffe81dc24e5 Co-authored-by: cabljac <32874567+cabljac@users.noreply.github.com>
Agent-Logs-Url: https://github.com/cabljac/venfork/sessions/c40592d8-164c-4312-912a-4ffe81dc24e5 Co-authored-by: cabljac <32874567+cabljac@users.noreply.github.com>
Applied in 34da799 (plus 56ada5c/55fe618): chunked seed tips now come from |
Number('') === 0, so an empty env value passed the finite/>=0 guard
and silently disabled retry backoff. Now empty/whitespace-only is
treated as unset (default 8000ms); explicit '0' stays valid.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #44
Problem
venfork setupseeds a new private mirror with a singlegit push <default>:<default>of full upstream history. For large upstreams GitHub 408s — it enforces an HTTP request-duration limit and the monolithic pack POST exceeds it:Reproduced seeding
genkit-ai/genkit(~203 MB pack) into a private mirror over gh-authenticated HTTPS (0.9.0, with the SSH-hang fix from #43). Distinct from the auth hang and not a rate limit — a request-duration timeout.The Source Imports REST API (which could offload this server-side) was deprecated and returns an error since 2024-04-12, so venfork must transfer the pack itself.
Change
New
seedMirrorInChunks(tempDir, httpsUrl, branch):git rev-list --reverse <branch>→ push in batches ofVENFORK_SEED_CHUNKcommits (default 1000),--force --no-thin, each torefs/heads/<branch>; then push the real tip.sync/stage/--pr.commits <= chunk) take a single push — no regression.http.postBuffer=512MB+http.version=HTTP/1.1+ low-speed abort, alongside the existing gh credential helper andnetExectimeout / non-interactive env.Replaces the single seed push in the
setupflow.Validated
This exact strategy was used manually to successfully seed
invertase/genkit-typescript-sandboxfrom genkit (5063 commits, 5 chunks + tip) after the single push 408'd.Tests
bunx tsc --noEmitclean,bun run checkclean,bun test→ 289 pass / 0 fail.:refs/heads/mainrefspec.VENFORK_SEED_CHUNK=2→ exactly 3 force pushes (commit 2, commit 4, real tip).