Skip to content

feat: migrate credential storage to cli-common/credstore (Phase B pilot)#158

Merged
rianjs merged 7 commits into
mainfrom
feat/INT-437-credstore-migration
May 17, 2026
Merged

feat: migrate credential storage to cli-common/credstore (Phase B pilot)#158
rianjs merged 7 commits into
mainfrom
feat/INT-437-credstore-migration

Conversation

@rianjs
Copy link
Copy Markdown
Collaborator

@rianjs rianjs commented May 17, 2026

What

slck adopts the shared cli-common/credstore library (Phase A), replacing the hand-rolled internal/keychain (macOS security shell-out + Linux/Windows plaintext file). Authoritative scope: Secret-Handling Standard §2.4 + cross-cutting §1.3/§1.4/§1.7/§1.8/§1.11/§1.12.

slck is the first real credstore consumer — this PR's structure is the reusable per-CLI template B2 (gro) / B3 (atlassian) follow. Plan: ~/.claude/plans/INT-437.md (Codex-architect converged 0/0/0/0 after 1B/3M→0B/1M/1m→clean).

Highlights

  • internal/keychain → pure credstore adapter (OS keyring only; no shell-out, no plaintext). Ref authoritative: ParseRef(config credential_ref)Open(parsed-service) against parsed-profile; backend/passphrase env names derived from the parsed service. Nothing hard-coded.
  • New internal/configconfig.yml (credential_ref, workspace, keyring.backend).
  • One-time legacy migration (§1.8/§2.4): old slck and slack-chat-api Keychain services + the legacy plaintext file → new bundle, with the api_tokenbot_token rename. All conflicts detected before any write/delete; conflicts fail loud naming every location and never print a value (masked or not). Idempotent; --overwrite resolution path.
  • _migration JSON (object-merge / non-object-wrap, run-scoped consume-once) + one-line stderr notice.
  • New slck set-credential (stdin / --from-env only, §1.5.2).
  • §1.12 no-leak acceptance suite; every credential test hermetic (file backend in temp HOME) — also fixes the B0-deferred TestRunInit_WrongTokenType non-hermetic flake.

⚠️ BREAKING (all standard-mandated)

  • SLACK_API_TOKEN / SLACK_USER_TOKEN no longer read at runtime (§1.11) — only as setup ingress via init --bot-token-from-env.
  • Positional config set-token <token> hard-deprecated (exits nonzero → points to set-credential); init --bot-token <value> value-flags removed.
  • Linux/Windows plaintext fallback replaced by the OS keyring (fail-closed; opt-in encrypted-file backend).
  • Migration on first run moves existing tokens automatically (zero re-auth) per the user's "both layouts" choice.

Verification

go build / go vet / gofmt -l clean · go test -race ./... all green · pinned golangci-lint v2.0.2 = 0 issues. Docs (README/CLAUDE.md) updated to the new ingress model; CLAUDE.md dep-drift (false slack-go/zalando) corrected.

Closes #157
[INT-437]

Replace the hand-rolled internal/keychain (macOS `security` shell-out +
Linux/Windows plaintext file) with the shared cli-common/credstore library,
per the Open CLI Collective Secret-Handling Standard §2.4 + §1.3/§1.4/§1.7/
§1.8/§1.11/§1.12. slck is the first real credstore consumer; this PR's
structure is the reusable per-CLI template for gro/atlassian.

- internal/keychain is now a credstore adapter (OS keyring only; no
  shell-out, no plaintext file). Ref is authoritative: ParseRef(config
  credential_ref) -> Open(parsed-service) against parsed-profile.
- New internal/config: config.yml (credential_ref, workspace, keyring.backend).
- One-time legacy migration (§1.8/§2.4): old `slck` AND `slack-chat-api`
  Keychain services + the legacy plaintext credentials file -> the new
  bundle, with the api_token->bot_token rename. All conflicts detected
  before any write/delete; conflicts fail loud naming all locations and
  never print a value (§1.8/§1.12). Idempotent; --overwrite resolution.
- _migration JSON signal (object-merge / non-object-wrap, consume-once)
  + one-line stderr notice.
- New `slck set-credential` (stdin / --from-env ingress, §1.5.2).
- §1.12 no-leak acceptance suite; all credential tests hermetic
  (file backend in a temp HOME), fixing the B0-deferred
  TestRunInit_WrongTokenType non-hermetic flake.

BREAKING CHANGE: SLACK_API_TOKEN / SLACK_USER_TOKEN are no longer read at
runtime (only as setup ingress via `init --bot-token-from-env`). Positional
`config set-token <token>` and `init --bot-token <value>` value-flags are
removed (use `slck set-credential --stdin` / `slck init`). Linux/Windows
plaintext fallback is replaced by the OS keyring (fail-closed; opt-in
encrypted-file backend).

Closes #157
[INT-437]
@rianjs
Copy link
Copy Markdown
Collaborator Author

rianjs commented May 17, 2026

Blocker

  • go.mod:3 now requires go 1.25.0, but B0 intentionally aligned the module and all workflows to Go 1.24. The workflows still install Go 1.24, so CI/release will reject this module before tests run. Keep this at go 1.24.0 unless the infra bump is intentionally widened in a separate unit.

  • internal/keychain/migrate.go:83-98 violates the conflict matrix for --overwrite: if multiple legacy sources disagree and there is no existing target value, overwrite=true falls through and picks group[0].value. The plan explicitly said overwrite cannot resolve legacy-vs-legacy disagreement. This must fail before any write/delete.

  • internal/cmd/config/clear.go:34-41 cannot perform the §1.8 remediation it tells users to use. config clear calls keychain.Open(), which runs migration first; in a legacy/keyring conflict, Open() returns ErrMigrationConflict before Clear() can delete the keyring entry. clear needs a no-migration open path or equivalent so “run config clear then re-run” actually works.

  • internal/keychain/keychain.go:109-117 silently ignores invalid keyring.backend config values. A typo falls back to auto backend instead of failing closed through credstore’s config-backend validation. That breaks §1.4 fail-closed selection and can put credentials in a backend the user did not intend.

Major

  • internal/cmd/initcmd/init.go:215-220 still uses a normal bufio.Scanner prompt for interactive token entry, so terminal input is echoed. The plan and README say interactive setup is no-echo; this is a §1.12 leak surface. Use term.ReadPassword for token prompts, with a test seam for non-terminal tests.

  • internal/output/migration.go:27-35 plus internal/keychain/migrate.go:123-130 records _migration even for text-mode runs, but only JSON output consumes it. In a long-lived/test process, a text command that migrated can leak the stale migration block into a later unrelated JSON response. Either record only when output.IsJSON() or reset after emitting the stderr path.

  • Stale setup guidance remains in user-facing help/docs: internal/cmd/search/search.go:16-21, internal/client/errors.go:12-13, internal/cmd/config/test.go:80-82, internal/cmd/whoami/whoami.go:100-104, README.md:770-773, and .github/workflows/release.yml:125-130 still point users at config set-token, including positional token examples and old plaintext storage caveats. This undercuts the §1.11/§1.12 ingress removal.

  • internal/cmd/config/noleak_test.go does not match the planned §1.12 acceptance scope: it only exercises config commands and captures output.Writer, not stderr or direct fmt.Print* prompts. It would miss the interactive prompt leak above and does not cover init, set-credential, whoami, config test, or mocked API command classes.

Minor

  • internal/cmd/config/clear.go:53-63 implements --all as “remove config.yml” only. The plan and §1.7 call for config plus caches and empty dirs. If slck has no caches today, the code/comment should make that explicit; otherwise this is incomplete.

  • internal/keychain/migrate.go:275-282 treats every security delete-generic-password ExitError as “already absent”. That can silently leave legacy keychain items behind on denial/locked/other failures after migration. Classify not-found narrowly or surface cleanup failure.

- go.mod back to go 1.24.0 (pin x/term v0.27, x/sys v0.28 — latest needed
  go 1.25, breaking B0's 1.24 alignment and CI).
- migration: legacy-vs-legacy disagreement is ALWAYS a conflict, even with
  --overwrite (it can't pick a winner; §1.8).
- config clear opens WITHOUT migration (keychain.OpenNoMigrate) so the §1.8
  remediation it advertises actually works during a conflict.
- fail closed on an unrecognized config keyring.backend (no silent
  degradation to auto-select; §1.4).
- init interactive token prompt is now no-echo (term.ReadPassword) with a
  non-terminal test seam (§1.12).
- _migration block recorded ONLY on JSON runs (no stale block leaking into
  a later JSON response in a long-lived/test process).
- purge residual 'config set-token'/plaintext guidance from search, errors,
  config test, whoami, README, and the Homebrew release caveats.
- §1.12 no-leak suite rewritten as internal/noleak: every command class
  via its real cobra command, capturing stdout+stderr+writer.
- narrow security-delete not-found classification to exit 44 only.
- clear --all documents the (currently empty) cache scope explicitly.

[INT-437]
@rianjs
Copy link
Copy Markdown
Collaborator Author

rianjs commented May 17, 2026

Reviewed the live branch at 535c810; the pasted diff still has stale hunks, so I treated the checked-out code as authoritative.

Findings

  • Blocker: internal/keychain/migrate.go:185 only probes legacy macOS Keychain sources when the selected destination backend is BackendKeychain. That breaks the planned six-source migration matrix: a macOS user with old slck / slack-chat-api Keychain items who opts into keyring.backend: file will not have those legacy values discovered, conflict-checked, migrated, signaled, or deleted. The legacy discovery matrix should be independent of the destination backend, unless the plan/standard explicitly adds that opt-out carveout.

  • Major: internal/noleak/noleak_test.go:75 only checks the full canary secret, while the test comment and §1.12 acceptance target include prefixes/masked-prefix leaks. The suite also still does not exercise every planned command class, notably config test and representative mocked API command paths like channels/messages. This is much stronger than B1, but it still does not prove the advertised “every command class” / “or any prefix” claim.

  • Minor: internal/cmd/initcmd/init.go:188 returns an empty string when --bot-token-from-env NAME points at an unset/empty env var, and runInit then exits successfully with “No tokens provided.” For explicit env ingress, this should fail closed like set-credential does, otherwise automation can report setup success while storing nothing. Same concern applies to --user-token-from-env.

The earlier issues around Go 1.24 compatibility, legacy-vs-legacy conflict behavior, config clear conflict remediation, invalid backend handling, no-echo interactive prompt, JSON-only _migration recording, and narrow security exit handling look fixed in the live branch.

- B: legacy macOS Keychain discovery is now independent of the destination
  backend (§2.4). A macOS user who opts into keyring.backend:file still gets
  old slck/slack-chat-api Keychain items migrated. Test hermeticity moved to
  a documented test-only env seam (SLCK_TEST_DISABLE_LEGACY_KEYCHAIN_SCAN,
  set by testutil) instead of coupling discovery to the backend.
- M: no-leak suite now also asserts masked-prefix/suffix canaries and
  covers config test + representative API command classes (channels list
  text+json, messages history).
- m: init --bot-token-from-env / --user-token-from-env fail closed on an
  empty/unset env var instead of silently reporting setup success.

[INT-437]
@rianjs
Copy link
Copy Markdown
Collaborator Author

rianjs commented May 17, 2026

Findings

  • Major: internal/noleak/noleak_test.go:164-174 does not actually exercise the JSON API command path. The test builds channels.NewCmd() directly, but -o/--output is a root persistent flag, so channels list -o json returns a Cobra flag error before credential/API execution. Because the test ignores Execute() errors, this is a false-positive no-leak pass. Use the root command path, or set output.OutputFormat = output.FormatJSON directly, and fail on unexpected parser/usage errors.

  • Major: internal/cmd/root/root_test.go:55, :90, and :122 still call client.New() without testutil.Setup. Those tests can touch the real keyring / migration path and still contain stale SLACK_USER_TOKEN / config set-user-token assertions. That violates the §1.12 hermeticity requirement for credential-touching tests.

  • Major: stale old credential guidance remains in user/test-facing docs: snap/snapcraft.yaml:15 still says slck config set-token, and integration-tests.md:53, :61, :237, :240, :318, :480, :499, :531, :640 still teach removed positional token ingress and runtime SLACK_*TOKEN credential env vars. This keeps the §1.11/§1.12-incompatible model alive outside README/release caveats.

The three re-review code fixes themselves look present: discovery is destination-backend independent, env-var ingress now fails closed when empty/unset, and the no-leak suite now includes full/prefix/suffix canaries plus more command classes.

Verification: go test ./internal/keychain ./internal/noleak ./internal/output passes. I could not complete ./internal/cmd/initcmd locally because the sandbox denies httptest port binding.

- no-leak suite: the JSON API path no longer passes vacuously. -o is a
  ROOT persistent flag; toggling output.JSON drives the real IsJSON() path
  instead, and API-class subtests now require a real (non-usage) execution
  error so a parser failure can't masquerade as a clean no-leak pass.
  Added root.Command() so the suite/B2/B3 can drive the real top-level
  command with persistent flags.
- root_test.go is now hermetic (testutil.Setup → file backend in temp
  HOME; no real keyring / security shell-out) and asserts bot-vs-user
  resolution by the live error instead of a long-removed exact string.
- purge remaining old credential guidance: snap/snapcraft.yaml caveat and
  integration-tests.md (setup, user-token, env-precedence, token-type,
  restore, error-table) — now teach init / set-credential --stdin and
  state env vars are not read at runtime.

[INT-437]
@rianjs
Copy link
Copy Markdown
Collaborator Author

rianjs commented May 17, 2026

Findings

  • Minor: integration-tests.md:470 still describes config show as showing masked token values and storage location, and integration-tests.md:491 still expects old Bot Token / User Token wording. The new contract is presence/ref/backend only, with no masked token characters. Update those expected-output notes to match bot_token / user_token presence semantics.

The three re-review #3 majors look addressed: no-leak JSON now drives the real output path and asserts non-usage execution errors, root token-mode tests are hermetic, and the remaining old credential guidance is purged from the main user-facing paths.

Verification: go test ./internal/keychain ./internal/noleak ./internal/output ./internal/cmd/root passed.

@rianjs
Copy link
Copy Markdown
Collaborator Author

rianjs commented May 17, 2026

TDD / Test-Coverage Assessment (independent)

Reviewed the diff and tests independently, ran the suite (go test ./internal/... → 729 pass; -race -count=3 on noleak/keychain/output stable). Overall the security-critical surface is well-covered and the suite is genuinely hermetic. No blocking findings. A few real gaps, mostly Major/Minor.

Strengths

  • Hermeticity is solid: testutil.Setup forces the file backend in a temp HOME/XDG, disables the darwin security probe, and resets the global _migration state before/after every test. No t.Parallel() anywhere, so the package-global migPending cannot bleed across tests. Confirmed stable under -race -count=3.
  • The internal/noleak suite is not vacuous: TestNoLeak_APICommandClasses asserts require.Error at the API boundary and rejects unknown flag/unknown command, so commands provably execute; canaries include masked prefix/suffix slices (catches a "first8+last4" leak), and all three channels (stdout/stderr/output.Writer) are captured.
  • Conflict path is tested for the value-leak invariant: TestMigrateConflictFailsLoudWithoutLeaking asserts NoLeakAssertion over both values and that the legacy file survives a conflict. Idempotency, equal-value silent cleanup, --overwrite, and api_token→bot_token rename each have a test. TestRefAuthoritative proves nothing is hard-coded.

Major

  1. The len(distinct) > 1 legacy-vs-legacy conflict branch (migrate.go:83-87) is never exercised. With the keychain scan disabled in tests and the legacy file parsed into a map[string]string (one value per field), it is structurally impossible for the suite to produce two differing legacy values for the same key. This is the exact path the PR body emphasizes ("--overwrite cannot pick a winner here either — the user must"). It needs a unit test on migrateLegacyOverwrite/conflict with an injected multi-candidate group (differing values) asserting it errors regardless of --overwrite and never prints a value. Currently the only distinct>1 enforcement is unverified.
  2. No test asserts the headline §1.11 runtime invariant. The PR’s primary BREAKING claim is that SLACK_API_TOKEN/SLACK_USER_TOKEN are no longer read at runtime. No test sets those env vars and asserts a command still fails with a missing-credential error (grep for them in *_test.go → zero hits). root_test.go:resolveErr proves an empty keyring fails but never proves the env var is ignored. Add a test: empty keyring + SLACK_API_TOKEN=xoxb-… set → client.New() still returns ErrMissingBotToken.
  3. Darwin legacy-Keychain migration path is 0% covered (keychainRead/keychainDelete, migrate.go:271-301, incl. the securityErrItemNotFound==44 idempotent-delete logic). Understandably hard to exercise hermetically, but the keychainDelete exit-code branch is exactly the kind of "silently leave a legacy secret behind after migration" risk the comment warns about. Consider factoring the exit-code classification into a pure helper and unit-testing it, or at minimum a runtime.GOOS=="darwin"-gated integration test.

Minor

  • spliceMigration is only tested for object / array / empty-object bodies. The non-object wrap branch (migration.go:65) also catches a JSON scalar/string body; no test feeds PrintJSON("a string") or a number with a pending block to confirm the {"_migration":…,"data":…} shape there. Cheap to add.
  • legacyCredentialsPath 50% / discover 63%: the XDG_CONFIG_HOME==""~/.config fallback is never taken (Setup always sets XDG_CONFIG_HOME). The Windows-no-%APPDATA% behavior is asserted only by a code comment, not a test.
  • RecordMigration marshal-error branch (migration.go:30, "must not break the actual command") is untested — would need an unmarshalable value.
  • keychain.set 66.7% / get/del partial: the error-wrap paths (store %s at %s: %w, read %s from %s: %w) are not asserted to be value-free; they’re structurally safe but an explicit no-leak assertion on a forced backend error would lock the §1.12 invariant down.

Nit

  • migrate.go:178 _ = target is the intentional "never printed" marker — fine, but the conflict no-leak guarantee for the keyring-target value relies on TestMigrateConflictFailsLoudWithoutLeaking which does cover it; good. No action.
  • Several defer func() { _ = st.Close() }() swallow Close errors in tests — acceptable for test teardown.

Verdict

Must-fix before merge: none are hard blockers, but Major #1 (untested legacy-vs-legacy conflict) and Major #2 (no test for the headline runtime-env-not-read invariant) are the two I would want addressed since they are the highest-stakes claims in the PR and are currently asserted only by code/comments, not tests. Major #3 and the Minors are reasonable as fast-follows given the hermeticity constraints.

- Extract the pure §1.8 resolver planMigration() out of
  migrateLegacyOverwrite (no I/O; injected current-value lookup) so every
  branch is unit-testable. TestPlanMigration now covers the
  legacy-vs-legacy disagreement branch (asserting --overwrite still cannot
  resolve it, and the error names all sources and leaks no value), plus
  legacy-vs-target / equal-target / identical-duplicate paths.
- TestRuntimeEnvNotReadAsCredential locks the headline §1.11 invariant:
  SLACK_API_TOKEN/SLACK_USER_TOKEN set, empty keyring -> reads still fail
  missing (env never consulted at runtime).

[INT-437]
@rianjs
Copy link
Copy Markdown
Collaborator Author

rianjs commented May 17, 2026

Thanks — addressed both Majors and replying on the noted gaps (commit 49aef89):

  • Major Epic: Phase 1 - Foundation & Infrastructure #1 (legacy-vs-legacy conflict untested): Extracted the resolver into a pure planMigration() (no I/O; injected current-value lookup). New TestPlanMigration directly exercises the len(distinct)>1 branch with two differing legacy values for one key — asserting it's a conflict for both overwrite=false and overwrite=true (overwrite cannot pick), that the message names every source, and that NoLeakAssertion finds neither value. Also covers legacy-vs-target / equal-target / identical-duplicate.
  • Major Epic: Phase 2 - Testing Infrastructure #2 (§1.11 env invariant unproven): TestRuntimeEnvNotReadAsCredential sets both SLACK_API_TOKEN/SLACK_USER_TOKEN over an empty hermetic keyring and asserts BotToken()/UserToken() still return ErrMissing* and Has*Token() stay false — env is provably never consulted at runtime.
  • Noted: darwin Keychain path / exit-44 idempotent delete is 0% covered. This is a deliberate, accepted limitation: it requires a real macOS login Keychain (security shell-out) which would make the suite non-hermetic and machine/CI-dependent — exactly what §1.12's test obligation forbids. The logic is small, reviewed, and the decision layer it feeds (planMigration) is now fully covered with synthetic keychain-shaped candidates. A real-Keychain check belongs in the manual integration-tests.md matrix, not the unit suite.

@rianjs
Copy link
Copy Markdown
Collaborator Author

rianjs commented May 17, 2026

Findings

No findings.

The planMigration extraction looks behavior-preserving: conflicts are resolved before any write/cleanup path, --overwrite still only resolves legacy-vs-target conflicts, equal target values remain cleanup-only/no-signal, and duplicate identical legacy values produce one write with all cleanups retained. The runtime env test also covers the important §1.11 invariant directly.

Verification:

  • GOCACHE=/private/tmp/slck-go-cache go test ./internal/keychain ./internal/noleak ./internal/output ./internal/cmd/root passed.
  • GOCACHE=/private/tmp/slck-go-cache go test -race ./internal/keychain -run 'TestPlanMigration|TestRuntimeEnvNotReadAsCredential|TestMigrate' -count=1 passed.
  • go test ./... was blocked by this sandbox’s local listener restriction: httptest: failed to listen on a port: bind: operation not permitted.

Comment thread internal/client/client.go
func NewBotClient() (*Client, error) {
token, err := keychain.GetAPIToken()
st, err := keychain.Open()
if err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium (harness-engineering:harness-architecture-reviewer): keychain.Open() is called inside NewBotClient() and NewUserClient(), which runs the one-time legacy migration on every API call. A migration conflict (ErrMigrationConflict) will surface from arbitrary commands like slck channels list, not just setup commands, producing a confusing error with no clear remediation path visible from the failing command. Migration should either be pre-checked at startup in root's PersistentPreRunE, or these client constructors should use a migration-already-ran sentinel so the side effect is bounded to the first invocation.

Reply to this thread when addressed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design, not changing. The §1.8 one-time migration is idempotent: once legacy originals are gone, discover() finds nothing and Open() is a cheap no-op — it does not "run migration on every API call" in steady state. A legacy/keyring conflict surfacing from channels list is the intended fail-closed behavior (§1.4/§1.8): slck must not operate with ambiguous credentials. The remediation path is visible and now unblocked — slck config show (diagnostic), slck config clear / slck delete-token (both OpenNoMigrate as of ea11ef5), or slck init --overwrite. A startup PersistentPreRunE pre-check would add a second migration entry point for no behavioral gain.

hasUserToken := keychain.HasStoredUserToken()
st, err := keychain.Open()
if err != nil {
return err
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium (harness-engineering:harness-enforcement-reviewer): runDeleteToken calls keychain.Open() which runs the one-time legacy migration. If a migration conflict exists (ErrMigrationConflict), delete-token is blocked — leaving no escape hatch short of manually editing the keyring. The clear.go command explicitly uses OpenNoMigrate() for this reason. delete-token has the same remediation role and should use OpenNoMigrate() as well.

Reply to this thread when addressed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ea11ef5 — delete-token now uses OpenNoMigrate(), matching config clear. It is a §1.8 remediation path ("delete the conflicting key, then re-run"); running migration first would surface ErrMigrationConflict and block the very command meant to resolve it.

st, err := keychain.OpenRef(opts.ref)
if err != nil {
return err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium (harness-engineering:harness-knowledge-reviewer): OpenRef(opts.ref) is called with runMigration=true. When a non-default --ref is supplied, the one-time legacy migration runs against the custom ref's service/profile rather than the canonical default ref, potentially writing migrated credentials into the wrong service/profile — the opposite of what a user specifying --ref intends. OpenRef should pass runMigration=false, or the --ref flag should be explicitly documented as 'skips migration'.

Reply to this thread when addressed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ea11ef5 — OpenRef now passes runMigration=false. The one-time §1.8 migration only ever targets the canonical configured ref; running it against an arbitrary --ref could write the default ref's legacy data under the wrong service/profile. set-credential is pure ingress; migration still runs on the next init / first API call.

Ref string `json:"credential_ref"`
Backend string `json:"backend"`
BackendSource string `json:"backend_source"`
PassphraseSource string `json:"passphrase_source,omitempty"`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-enforcement-reviewer): runShow calls keychain.Open() which triggers the one-time legacy migration. If a migration conflict exists, the diagnostic command 'config show' fails with ErrMigrationConflict rather than displaying current credential state — giving users no diagnostic path to understand which keys are in which location before remediating. Consider using OpenNoMigrate() in runShow.

Reply to this thread when addressed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ea11ef5 — config show now uses OpenNoMigrate(). It is the §1.11 item 3 diagnostic and must remain usable during an unresolved conflict so the user can see which keys are where before remediating.

switch opts.key {
case keychain.KeyBotToken:
err = st.SetBotToken(value)
case keychain.KeyUserToken:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-enforcement-reviewer): readValue() with --from-env silently returns an empty string when the named env var is unset, then falls through to the generic 'empty secret value rejected' error. Unlike init.go's resolveBot() which reports '--bot-token-from-env %s is empty or unset' (naming the var), the user has no way to know which env var was the problem. Add a targeted error naming the variable before the generic empty guard.

Reply to this thread when addressed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ea11ef5 — readValue with --from-env now returns "--from-env is empty or unset" (parity with init resolveBot/resolveUser). The variable name is not secret; the value is never echoed (§1.12).

}
return ""
return openWith(cfg, false, true)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-architecture-reviewer): OpenRef (used by slck set-credential --ref) calls openWith(..., runMigration=true), so specifying an alternate --ref on a fresh install will trigger migration against the DEFAULT ref's legacy data before writing to the requested ref. If migration surfaces a conflict, set-credential fails with an error that makes no sense in the context of a simple credential-write operation. OpenRef should use runMigration=false.

Reply to this thread when addressed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ea11ef5 (same change as the setcred.go:71 thread) — OpenRef now uses runMigration=false, so an alternate --ref no longer triggers migration against the default ref's legacy data.

return err
}
botToken, err := opts.resolveBot()
if err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-knowledge-reviewer): In non-interactive mode (--bot-token-from-env, --user-token-from-env, --bot-token-stdin), the overwrite-existing-credentials prompt is skipped and tokens are written with WithOverwrite() without any feedback. A CI pipeline that accidentally re-runs slck init will silently replace existing credentials. Consider emitting an 'Overwriting existing credential at ' notice for non-interactive overwrites.

Reply to this thread when addressed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ea11ef5 — non-interactive ingress now emits a stderr notice ("Notice: overwriting existing credential(s) at ") before replacing a stored credential, so an accidental CI re-run of slck init is visible. The ref is not secret; the value is never printed.

}
if err := s.cs.Delete(s.profile, key); err != nil && !errors.Is(err, credstore.ErrNotFound) {
return fmt.Errorf("delete %s at %s: %w", key, s.ref, err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-self-documenting-code-reviewer): del() uses an Exists pre-check to make Delete idempotent, but this is a TOCTOU pattern: a concurrent writer can add the key after Exists returns false, and a concurrent deleter can remove it after Exists returns true. The Delete call already handles ErrNotFound, so the Exists check is redundant. The idempotency contract is already satisfied by the errors.Is(err, ErrNotFound) guard on Delete alone. Remove the Exists pre-check.

Reply to this thread when addressed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not redundant — keeping the Exists pre-check. The premise ("Delete already handles ErrNotFound") does not hold for the file backend: credstore's file backend returns a raw os "not found" error from Delete, not credstore.ErrNotFound, so errors.Is(err, ErrNotFound) does NOT catch it. This was established by a TDD-driven test (del() was genuinely non-idempotent on the file backend before the Exists guard). The TOCTOU concern is academic for a single-process CLI mutating its own keyring under the active profile. The rationale is documented at the call site (keychain.go:177-179).

Comment thread internal/config/config.go

// Path is the config.yml location.
func Path() string { return filepath.Join(Dir(), configFileName) }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-architecture-reviewer): home, _ := os.UserHomeDir() silently discards the error. On a misconfigured system where UserHomeDir() fails, home is the empty string and Dir() returns /.config/slack-chat-api, which is outside the user's home and will likely fail with a permission error during Save(). The error should be propagated or produce an explicit error message.

Reply to this thread when addressed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not changing. home, _ := os.UserHomeDir() matches the established convention across this codebase (e.g. migrate.go legacyCredentialsPath); UserHomeDir only fails when neither $HOME nor the OS user db is available, an exotic state in which Save() already fails loudly with a permission error naming the path. Threading an error through Dir()/Path() ripples into many call sites for an unreachable-in-practice case — non-surgical scope creep relative to this ticket (Code Quality Lens #3).

if botToken == "" && userToken == "" {
output.Println("No tokens provided. Setup cancelled.")
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-architecture-reviewer): workspace = info.TeamID stores the machine-readable team ID (e.g. T04AB1234) in config.yml while verify() prints the human-readable info.Team to the user, so the stored value and the displayed value are different fields. The current intent is ambiguous — if config show is meant to be human-readable, info.Team is the right choice; if used programmatically, TeamID is correct but should be documented.

Reply to this thread when addressed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ea11ef5 — added a comment documenting the intent: workspace stores info.TeamID (the stable machine identifier, durable across workspace renames) as a non-secret correlation field; verify() prints info.Team only for human feedback. This is deliberate, not an inconsistency.

Comment thread go.mod

go 1.24
go 1.24.0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-architecture-reviewer): cli-common is pinned at a pseudo-version (v0.0.0-20260516182733-b753d5c62d14) rather than a tagged semver release. This library owns all OS keyring I/O and defines the AllowedKeys allowlist — it is the core security component of this PR. A pseudo-version gives no semantic contract: a breaking change or security fix in the upstream library will not be flagged by go get -u tooling. The library should be tagged before this PR merges, or the go.mod should document why a tag is not yet available.

Reply to this thread when addressed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred deliberately, not blocking this PR. The pseudo-version is intentional during active Phase B credstore co-development — cli-common's API is still being shaped by each pilot CLI (slck is the first). Tagging a semver release is tracked at the epic level (INT-310) and cli-common will be pinned to a tag before Phase B closes, after the API has stabilized across gro/atlassian. Pinning to a premature tag now would force a churn of bump PRs across every CLI mid-migration.

Copy link
Copy Markdown

@monit-reviewer monit-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR Review

Reviewed commit: 49aef89

Summary

Reviewer Findings
harness-engineering:harness-architecture-reviewer 5
harness-engineering:harness-enforcement-reviewer 3
harness-engineering:harness-knowledge-reviewer 3
harness-engineering:harness-self-documenting-code-reviewer 1
harness-engineering:harness-architecture-reviewer (5 findings)

⚠️ Should Fix - internal/client/client.go:45

keychain.Open() is called inside NewBotClient() and NewUserClient(), which runs the one-time legacy migration on every API call. A migration conflict (ErrMigrationConflict) will surface from arbitrary commands like slck channels list, not just setup commands, producing a confusing error with no clear remediation path visible from the failing command. Migration should either be pre-checked at startup in root's PersistentPreRunE, or these client constructors should use a migration-already-ran sentinel so the side effect is bounded to the first invocation.

💡 Suggestion - internal/keychain/keychain.go:85

OpenRef (used by slck set-credential --ref) calls openWith(..., runMigration=true), so specifying an alternate --ref on a fresh install will trigger migration against the DEFAULT ref's legacy data before writing to the requested ref. If migration surfaces a conflict, set-credential fails with an error that makes no sense in the context of a simple credential-write operation. OpenRef should use runMigration=false.

💡 Suggestion - internal/config/config.go:63

home, _ := os.UserHomeDir() silently discards the error. On a misconfigured system where UserHomeDir() fails, home is the empty string and Dir() returns /.config/slack-chat-api, which is outside the user's home and will likely fail with a permission error during Save(). The error should be propagated or produce an explicit error message.

💡 Suggestion - internal/cmd/initcmd/init.go:110

workspace = info.TeamID stores the machine-readable team ID (e.g. T04AB1234) in config.yml while verify() prints the human-readable info.Team to the user, so the stored value and the displayed value are different fields. The current intent is ambiguous — if config show is meant to be human-readable, info.Team is the right choice; if used programmatically, TeamID is correct but should be documented.

💡 Suggestion - go.mod:4

cli-common is pinned at a pseudo-version (v0.0.0-20260516182733-b753d5c62d14) rather than a tagged semver release. This library owns all OS keyring I/O and defines the AllowedKeys allowlist — it is the core security component of this PR. A pseudo-version gives no semantic contract: a breaking change or security fix in the upstream library will not be flagged by go get -u tooling. The library should be tagged before this PR merges, or the go.mod should document why a tag is not yet available.

harness-engineering:harness-enforcement-reviewer (3 findings)

⚠️ Should Fix - internal/cmd/config/delete_token.go:52

runDeleteToken calls keychain.Open() which runs the one-time legacy migration. If a migration conflict exists (ErrMigrationConflict), delete-token is blocked — leaving no escape hatch short of manually editing the keyring. The clear.go command explicitly uses OpenNoMigrate() for this reason. delete-token has the same remediation role and should use OpenNoMigrate() as well.

💡 Suggestion - internal/cmd/config/show.go:35

runShow calls keychain.Open() which triggers the one-time legacy migration. If a migration conflict exists, the diagnostic command 'config show' fails with ErrMigrationConflict rather than displaying current credential state — giving users no diagnostic path to understand which keys are in which location before remediating. Consider using OpenNoMigrate() in runShow.

💡 Suggestion - internal/cmd/setcred/setcred.go:77

readValue() with --from-env silently returns an empty string when the named env var is unset, then falls through to the generic 'empty secret value rejected' error. Unlike init.go's resolveBot() which reports '--bot-token-from-env %s is empty or unset' (naming the var), the user has no way to know which env var was the problem. Add a targeted error naming the variable before the generic empty guard.

harness-engineering:harness-knowledge-reviewer (3 findings)

⚠️ Should Fix - internal/cmd/setcred/setcred.go:71

OpenRef(opts.ref) is called with runMigration=true. When a non-default --ref is supplied, the one-time legacy migration runs against the custom ref's service/profile rather than the canonical default ref, potentially writing migrated credentials into the wrong service/profile — the opposite of what a user specifying --ref intends. OpenRef should pass runMigration=false, or the --ref flag should be explicitly documented as 'skips migration'.

⚠️ Should Fix - internal/cmd/initcmd/init_test.go:1

No test exercises the --overwrite flag path (OpenForMigrationOverwrite). This is the only resolution path for an ErrMigrationConflict that forces a legacy credential over an existing keyring entry. A regression in that code path could cause silent credential corruption with no test catching it.

💡 Suggestion - internal/cmd/initcmd/init.go:100

In non-interactive mode (--bot-token-from-env, --user-token-from-env, --bot-token-stdin), the overwrite-existing-credentials prompt is skipped and tokens are written with WithOverwrite() without any feedback. A CI pipeline that accidentally re-runs slck init will silently replace existing credentials. Consider emitting an 'Overwriting existing credential at ' notice for non-interactive overwrites.

harness-engineering:harness-self-documenting-code-reviewer (1 findings)

💡 Suggestion - internal/keychain/keychain.go:185

del() uses an Exists pre-check to make Delete idempotent, but this is a TOCTOU pattern: a concurrent writer can add the key after Exists returns false, and a concurrent deleter can remove it after Exists returns true. The Delete call already handles ErrNotFound, so the Exists check is redundant. The idempotency contract is already satisfied by the errors.Is(err, ErrNotFound) guard on Delete alone. Remove the Exists pre-check.

4 info-level observations excluded. Run with --verbose to include.

7 PR discussion threads considered.


Completed in 4m 21s | $1.79 | sonnet | daemon 0.2.120 | Glorfindel
Field Value
Model sonnet
Reviewers hybrid-synthesis, documentation:docs-reviewer, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine claude · sonnet
Reviewed by pr-review-daemon · monit-pr-reviewer
Duration 4m 21s wall · 13m 27s compute (Reviewers: 3m 06s · Synthesis: 1m 12s)
Cost $1.79
Tokens 298.5k in / 52.4k out
Turns 7

Per-workstream usage

Workstream Model In Out Cache read Cache create Cost
hybrid-synthesis sonnet 33.6k 3.4k 0 33.6k (1h) $0.19
documentation:docs-reviewer haiku 10.9k 10.7k 0 10.9k (1h) $0.07
harness-engineering:harness-architecture-reviewer sonnet 52.3k 6.2k 2.1k 50.1k (1h) $0.33
harness-engineering:harness-enforcement-reviewer sonnet 52.3k 9.9k 2.1k 50.1k (1h) $0.38
harness-engineering:harness-knowledge-reviewer sonnet 52.3k 7.8k 2.1k 50.1k (1h) $0.35
harness-engineering:harness-self-documenting-code-reviewer sonnet 48.5k 8.4k 2.1k 46.3k (1h) $0.34
security:security-code-auditor haiku 48.7k 6.0k 0 48.7k (1h) $0.13

Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.

Note: Posted 11 inline comments. 1 finding remained in the summary because GitHub could not attach them to the current diff.

… hatches, ingress)

- delete-token / config show now use OpenNoMigrate so they remain usable
  during an unresolved §1.8 conflict (same remediation role as config clear);
  config show is the §1.11 item 3 diagnostic and must not be blocked
- OpenRef (set-credential --ref) no longer runs migration: the one-time §1.8
  migration only ever targets the canonical configured ref, never an
  arbitrary --ref (which would discover the default ref's legacy data)
- set-credential --from-env now names the empty/unset variable (parity with
  init resolveBot/resolveUser); never echoes a value (§1.12)
- init: non-interactive ingress emits a stderr notice before overwriting an
  existing credential so an accidental CI re-run is visible
- document why workspace stores info.TeamID (stable id) not info.Team
- add end-to-end init --overwrite tests: migration conflict fails loud
  without leaking, and --overwrite forces the legacy value

[INT-437]
@rianjs
Copy link
Copy Markdown
Collaborator Author

rianjs commented May 17, 2026

Codex consolidated final review (post-daemon-fix batch)

Findings: none.

The daemon-fix batch preserves the intended architecture and §1.x boundaries. Open() remains the only migration-running default path, OpenNoMigrate() now correctly covers config show, delete-token, and clear, and OpenRef() is pure ingress with no arbitrary-ref migration side effects. The no-leak/ingress changes also look correct.

The declined daemon items are sound: per-call Open() migration is the fail-closed contract, the Exists pre-check is justified for backend parity, the config home handling matches the repo’s convention, and the pseudo-version rationale is acceptable for this phase.

Verification run:
go test ./internal/keychain ./internal/cmd/setcred ./internal/cmd/root ./internal/noleak ./internal/output
go test ./internal/cmd/config -run 'TestRunShow|TestRunDelete|TestRunClear|TestSetToken'
go test ./internal/cmd/initcmd -run 'TestRunInit_(MigrationConflict|OverwriteResolvesMigrationConflict|Stdin_NoVerify|Wrong)'

A broader initcmd subset hit the known sandbox localhost-bind restriction from httptest, not a code failure.

@rianjs rianjs merged commit d9f88fd into main May 17, 2026
2 checks passed
@rianjs rianjs deleted the feat/INT-437-credstore-migration branch May 17, 2026 11:59
rianjs added a commit that referenced this pull request May 19, 2026
…nt [INT-446] (#164)

## Problem

Released `slck` macOS binaries fail closed on every credential op.
`.goreleaser.yaml` set `CGO_ENABLED=0` and the release job ran on
`ubuntu-latest`; `99designs/keyring`'s Keychain backend is `//go:build
darwin && cgo`, so with cgo off it was never compiled in. On macOS
`credstore` auto-selects the Keychain backend then `keyring.Open`
returns `ErrNoAvailImpl` → fail-closed. Every Mac user who upgraded
since the credstore migration (#158) is broken. Verified **macOS-only**
(Windows wincred and Linux godbus secret-service are pure Go, fine under
`CGO_ENABLED=0`).

## Fix (pipeline-only — no Go source change)

- **`.goreleaser.yaml`**: split builds — `slck-darwin` (`CGO_ENABLED=1`,
fully-keyed per-arch `CC=xcrun clang -arch …` overrides for
`darwin_amd64_v1` / `darwin_arm64_v8.0`) + `slck-unix-win`
(`CGO_ENABLED=0` static). `nfpms.ids: [slck-unix-win]` (not the
deprecated `builds`). **Both macOS arches (Intel + Apple Silicon) still
produced** — the split is by GOOS, not arch.
- **`release.yml`**: goreleaser job → pinned `macos-15` (cgo+darwin
can't cross-compile from Linux). Restructured to **verify before
publish**: `goreleaser check` → `release --snapshot` (no publish) →
pre-publish gate → real `release --clean --release-notes`. The gate
parses `dist/artifacts.json` and asserts: arm64 binary functionally
reports `backend=keychain backend_source=auto
credential_ref=slack-chat-api/default` (isolated HOME/XDG, `-u
SLACK_CHAT_API_KEYRING_BACKEND`); amd64 binary links
`Security.framework` (necessary cgo signal for the slice the runner
can't execute); both darwin archives present exactly once. A
backend-less darwin binary can no longer be published silently.
- Hardened the `update-homebrew` checksum step (`set -euo pipefail` +
non-empty SHA assertions) so a renamed/missing archive can't publish a
broken cask.

## Non-goals

- No cgo on linux/windows (glibc portability regression, zero benefit —
pure-Go backends).
- No runtime env credential fallback (would unwind the §1.11
keyring-only invariant).
- The pre-existing `workflow_dispatch` `TAG: github.ref_name ||
inputs.tag` bug is out of scope (only affects the dispatch path; this
fix uses the tag-push path) — separate ticket.

## Validation

`goreleaser check` passes; both cgo cross-builds succeed locally (arm64
native + amd64 cross, correct Mach-O archs, amd64 links
Security.framework); the gate's functional assertion reproduced on real
hardware (`backend=keychain/auto`).

## Release mechanics

`auto-release.yml` Gate-1 is a path filter (`**.go|go.mod|go.sum`); a
pipeline-only diff matches none, so this is a `ci:` commit that does not
auto-release. After merge the corrected release is cut by a deliberate
annotated tag push (`release.yml` `on: push: tags: v*`).

Pilot for the cross-CLI CGO regression (Jira INT-446, child of INT-310);
verified on a non-dev Mac before any fan-out to nrq/gro/atlassian.

Closes #163
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: migrate credential storage to cli-common/credstore (Phase B pilot)

2 participants