Skip to content

Install loop atomicity: mid-step interruption leaves canonicals without symlinks #27

@edheltzel

Description

@edheltzel

What happened

During an ./install.sh run, the Commands step copied the canonical slash-command files to ~/.agents/Recall/claude/commands/Recall/ but never created the symlinks at ~/.claude/commands/Recall/. The ~/.claude/commands/Recall/ directory didn't exist at all — even mkdir -p from the top of the block didn't survive.

Evidence:

  • ✓ Canonicals present at ~/.agents/Recall/claude/commands/Recall/ (8 files, dated 21:14)
  • ✗ Target dir ~/.claude/commands/Recall/ not present
  • ✓ Guide step (which runs immediately before Commands in install.sh) completed successfully — ~/.claude/Recall_GUIDE.md symlink intact

Most likely cause: the install was interrupted between recall_copy_canonical and recall_link inside the per-file loop in the Commands step (install.sh:165-186), or something cleared the directory after the install completed.

What's already in place

After this incident we added (in c46cc90):

  • recall_verify_install in lib/install-lib.sh — final post-install gate that walks every expected symlink (hooks, slash commands, guides, extract_prompt.md) and fails loudly if any are missing or mislinked. Wired into both install.sh's self-check and update.sh:step_verify.
  • Doctor probes for slash commands in src/commands/doctor.tsmem doctor discovers slash commands dynamically from the canonical dir and reports drift. mem doctor --fix repairs the symlinks per the collision rule.

What this issue tracks

The verification gate catches the failure mode AFTER the install completes, but the root cause is that per-step state isn't atomic: a step can copy half its outputs, exit, and leave the install in a partial state. The next install run finds existing canonicals (skips the copy) but still needs to redo the symlink half — which the current loop does, idempotently.

Things worth exploring:

  • A --resume flag that explicitly re-runs every step, treating idempotency as the source of truth instead of the "step already done" early-return shortcuts.
  • An after-step verification hook so each step asserts its own postcondition before the next step starts (catches the bug mid-install instead of at the end).
  • A clearer story for what to do when verification fails — currently the user gets "missing: " and a recovery hint, but no guidance on whether to re-run install vs. mem doctor --fix vs. manual.

Repro

Hard to reproduce reliably — depends on interruption timing. The verification gate is what we have to lean on for now.

Related:

  • install.sh:165-186 (Commands step)
  • lib/install-lib.sh:recall_verify_install
  • src/commands/doctor.ts:buildSymlinkProbes (slash command probes)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions