Skip to content

fix(create): clean up orphan vectoria kb on pipeline failure (0.5.2)#5

Merged
voidkey merged 3 commits into
mainfrom
feature/orphan-kb-cleanup
May 14, 2026
Merged

fix(create): clean up orphan vectoria kb on pipeline failure (0.5.2)#5
voidkey merged 3 commits into
mainfrom
feature/orphan-kb-cleanup

Conversation

@voidkey
Copy link
Copy Markdown
Collaborator

@voidkey voidkey commented May 14, 2026

Summary

Stacked on #4 (0.5.1). Fixes the orphan-kb leak that has been silently accumulating on every failed vk create invocation since this command shipped.

Problem: vk create --from <file> always creates a fresh kb in vectoria, uploads the doc, then calls tasks/init. If anything fails between kb creation and task.succeeded, the kb is left orphaned forever. Real-backend smoke during 0.5.0 alone left 6+ orphans; the user's tenant had accrued 424 kbs in total.

Fix: Three failure boundaries now clean up:

  1. Inside uploadFile / uploadURL — kb cleanup deferred if doc upload or status poll fails before the helper returns. Previously the helper returned ("", "", err) and the outer caller never knew a kb was orphaned.
  2. RunE-level defer — fires for script_invalid, generic InitTask errors, the --mode script no-doc reject. Cleared once InitTask returns OK (backend task now owns kb lifecycle).
  3. Inline before os.Exit(5) — the insufficient_credits path needs explicit cleanup because os.Exit skips defers. Comment documents this asymmetry.

Past InitTask success — task.failed, stream interrupted, --async detach, etc. — the CLI does not delete the kb. Backend may still be reading.

Diff

  • client/vectoria/knowledgebase.go + test: new DeleteKB(ctx, kbID) wrapping the existing backend endpoint (probed live, returns 204).
  • cmd/create.go: 3 cleanup boundaries described above + cleanupOrphanKB best-effort helper.
  • tests/integration/create_credits_test.go: extends the 0.5.1 test to drive the upload path and assert DELETE /v1/knowledgebases/{id} fires.
  • CHANGELOG.md + version bump 0.5.1 → 0.5.2.

Test plan

  • go test ./... green including extended integration test
  • go build ./... clean
  • Real-backend manual: DELETE /v1/knowledgebases/<orphan-id> verified to work (204 No Content) before writing the fix
  • CI three-platform (will appear after push)
  • Manual smoke: re-run vk create --mode script with invalid docx, verify kb count in GET /v1/knowledgebases doesn't increment

Stacking note

Branches off #4 (fix/credits-exit-code). If #4 merges first, this rebases trivially onto main. If both are open, review #4 first.

Out of scope

  • Long-lived shared kb design (option C from the brainstorm) — bigger redesign with vk kb list/delete CLI subcommands; tracked as follow-up.
  • Bulk cleanup of the 423 existing orphan kbs in the user's tenant — separate housekeeping; could ship a vk kb prune command in 0.6.0.
  • Auth refresh purge bug — separate concern; needs middleware diagnostics first.

voidkey added 3 commits May 14, 2026 19:06
InitTask's insufficient_credits handler returned a plain fmt.Errorf
which cobra mapped to its default exit 1, while the same condition
surfacing later in the stream path explicitly os.Exit(5). The
documented contract (AGENTS.md, README) is 5 = business failure;
agents branching on exit codes had to know about two paths.

Caught during real-backend smoke for 0.5.0 (depleted credits during
testing made the inconsistency observable).

Match the stream-side pattern verbatim: print the i18n message to
stderr, os.Exit(5). Add an integration test that mocks /v1/tasks/init
returning envelope code 100001 and asserts the binary exits 5.
Before this change, every failed `vk create` invocation left a
freshly-created vectoria knowledgebase orphaned. Real-backend smoke
during 0.5.0 alone accumulated 6+ such kbs from a handful of test
runs; the user's tenant had accrued 424 kbs in total.

Three failure boundaries now clean up the kb the CLI just created:

1. uploadFile / uploadURL — kb cleanup deferred if the subsequent
   doc upload or doc-status poll fails inside the helper. The kb
   leak window is closed before the function returns ("","",err).

2. RunE — kb cleanup deferred from the moment kbID is known until
   InitTask succeeds. Covers script_invalid (returns clerr.Validation
   → defer runs), generic InitTask errors (returns err → defer runs),
   and the `--mode script` no-doc reject path.

3. insufficient_credits handler — os.Exit(5) skips defers, so the
   same cleanup is invoked inline before os.Exit. The comment in
   create.go documents this asymmetry.

Past InitTask success, the backend task owns the kb's lifecycle and
the CLI must not delete it (even on task.failed, stream interrupted,
--async detach, etc.) because the backend may still be reading from
it. The `initTaskClaimed` flag fences this.

Adds client/vectoria.Client.DeleteKB wrapping the backend's existing
DELETE /v1/knowledgebases/{id} endpoint. Best-effort: errors are
swallowed so cleanup hygiene doesn't mask the real failure.

Extends the 0.5.1 integration test to drive the upload path and
assert DELETE /v1/knowledgebases/{id} fires after the 100001
rejection.
…ence

Apply findings from /simplify review on the orphan-kb cleanup:

- cmd/create.go: add 5s context timeout to every DeleteKB cleanup
  callsite. Previously context.Background() had no deadline; a hung
  backend during cleanup would silently stall the user, especially
  on the synchronous inline path before os.Exit(5).
- cmd/create.go: replace the standalone `initTaskClaimed bool` flag
  with `task != nil` (declaring `task` earlier so the defer can
  reference it). Same semantics, one less state variable to reason
  about. Trim the over-narrated comments around the defer.
- tests/integration/video_flow_test.go: introduce `buildProfile(t,
  endpoints map[string]string)` as the core helper; `buildVideoProfile`
  becomes a one-line wrapper. Removes 90%-duplicate YAML/file-writing
  logic that buildProfileFigVect (now deleted) had reinvented.
- tests/integration/create_credits_test.go: use buildProfile directly,
  drop buildProfileFigVect. Tighten DELETE-path assertion to full path
  equality (loud failure if upload/poll ever mis-routes), add inline
  comments on the exact-vs-prefix mux pattern.

No behavior change; tests still green.
@voidkey voidkey merged commit 1d36687 into main May 14, 2026
3 checks passed
@voidkey voidkey deleted the feature/orphan-kb-cleanup branch May 14, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant