Skip to content

Add Verus/Kani proofs and ExecPlan for LSH invariants (7.2.4/7.2.5)#194

Merged
leynos merged 8 commits intomainfrom
add-verus-proofs-lsh-new-invariants-5sefjo
Apr 17, 2026
Merged

Add Verus/Kani proofs and ExecPlan for LSH invariants (7.2.4/7.2.5)#194
leynos merged 8 commits intomainfrom
add-verus-proofs-lsh-new-invariants-5sefjo

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Apr 8, 2026

Summary

  • Adds a concrete Verus/Kani proof workflow for the clone-detector LSH invariants and accompanying tests; wires ExecPlan for 7.2.4/7.2.5 and updates documentation and tooling. This implements Stages B–I while preserving the runtime API.

What this PR adds

  • Verus proof: verus/clone_detector_lsh_config.rs (semantic model of LshConfig::new)
  • Kani harness: crates/whitaker_clones_core/src/index/kani.rs (crate-private, bounded smoke harness)
  • Proof tooling and wiring:
    • Makefile targets: verus-clone-detector, kani-clone-detector (in addition to existing verus and kani scaffolding)
    • Script scaffolding: scripts/install-kani.sh, scripts/run-kani.sh, scripts/run-verus.sh updated for grouped proof entry points
  • ExecPlan documentation and wiring:
    • ExecPlan: docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md
  • Documentation and test surface:
    • Documentation updates: docs/whitaker-clone-detector-design.md, docs/roadmap.md updates, and docs/developers-guide.md
    • Tests: extended unit tests and BDD coverage in min_hash_lsh feature area (zero bands, zero rows, invalid product, overflow edge)
    • Feature tests updated to cover new edge cases
  • Artefacts this plan creates:
    • Verus proof: verus/clone_detector_lsh_config.rs
    • Kani harness: crates/whitaker_clones_core/src/index/kani.rs
    • Makefile and scripts: verus and Kani targets, install/run helpers
    • ExecPlan: docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md
    • Documentation updates: docs/whitaker-clone-detector-design.md, docs/roadmap.md, docs/developers-guide.md updates
    • Tests: extended unit tests and BDD coverage in min_hash_lsh feature area

Scope and constraints

  • Strict focus on roadmap items 7.2.4 and 7.2.5. Do not mark 7.2.6–7.2.8 as done in this change.
  • Proof tooling remains sidecar-based (Verus for semantic invariants; Kani for bounded checks) per ADR 002.
  • Preserve the existing runtime API shape; only crate-private proof seams may be introduced if needed.
  • Build on the existing clone-detector crate (crates/whitaker_clones_core). Do not move LshConfig into a separate crate.
  • If Kani Harnesses use #[cfg(kani)], register that cfg in the crate's unexpected_cfgs allowlist.
  • Keep each Rust source file focused and importable; avoid large monolithic files.
  • Workspace-wide lint/version constraints maintained (e.g., rstest-bdd version).

Implementation plan (highlights)

  • Stage B: Add clone-detector proof sidecar tooling and Makefile targets
    • Extend scripts/run-verus.sh to group proofs by domain (clone-detector group)
    • Add install-kani.sh and run-kani.sh for Kani harnesses
    • Update Makefile with verus-clone-detector and kani-clone-detector targets
  • Stage C: Extend unit tests (ZeroBands, ZeroRows, InvalidBandRowProduct, overflow edge)
  • Stage D: Extend rstest-bdd coverage for min_hash_lsh config outcomes
  • Stage E: Add Verus proof for LshConfig::new (verus/clone_detector_lsh_config.rs)
  • Stage F: Add minimal Kani smoke harness (crates/whitaker_clones_core/src/index/kani.rs)
  • Stage G: Update clone-detector design/roadmap docs
  • Stage H: Gatekeeping and quality checks (fmt, lint, tests, docs)
  • Stage I: Finalize ExecPlan after implementation

Validation plan (evidence to collect)

  • Run gates:
    • make fmt, make markdownlint, make nixie
    • make verus-clone-detector
    • make kani-clone-detector
    • make check-fmt, make lint, make test
  • Optional broader proof smoke: make verus, make kani

Acceptance criteria

  • The ExecPlan document is added and clearly references 7.2.4 and 7.2.5.
  • Stage B–I milestones are defined and traceable in the plan.
  • Provisions for new proof tooling, targets, and tests are described (without changing runtime API yet).
  • Design decisions are captured in the ExecPlan and related docs updated.

Rationale

This change provides a concrete path to Verus-based proofs for LshConfig::new with a minimal Kani smoke harness, preserving API stability and keeping proof tooling separate from normal cargo/test flows as guided by ADR 002 and the roadmap.

Related guidance

  • ADR 002 on proof-workflow separation (Verus for semantic invariants; Kani for bounded checks).
  • Existing LshConfig runtime contract constraints: bands > 0, rows > 0, bands * rows == MINHASH_SIZE (128).

Progress

  • Stage A–I completed for this turn; gates to be re-run in CI.

Surprises & Discoveries

  • The repository already includes a pinned Verus sidecar workflow; Kani harnesses are introduced here as a first-stage smoke gate.

Task

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

Summary

This PR introduces formalised proof tooling for LshConfig::new invariants as part of roadmap items 7.2.4 and 7.2.5. It adds a Verus semantic model (verus/clone_detector_lsh_config.rs) specifying acceptance/rejection logic and a Kani bounded-model-checking harness suite (crates/whitaker_clones_core/src/index/kani.rs), along with corresponding Make targets, shell scripts, and documentation updates.

Key changes

Proof artefacts:

  • verus/clone_detector_lsh_config.rs: Spec-only Verus model defining LshConfigOutcome enum and branch-structured proof lemmas for constructor invariants (zero bands/rows rejection, checked_mul-based overflow handling, and bands * rows == MINHASH_SIZE acceptance).
  • crates/whitaker_clones_core/src/index/kani.rs: Three Kani harnesses—smoke test, symbolic bounded harness (bands ≤ 128, rows ≤ 128), and overflow-product symbolic harness—executing the real LshConfig::new with targeted assertions for each code path.

Build and scripting:

  • Makefile targets verus-clone-detector and kani-clone-detector added; kani description updated.
  • Scripts scripts/run-kani.sh and scripts/run-verus.sh refactored to support dispatched execution modes (clone-detector, decomposition, all).
  • scripts/install-verus.sh enhanced to handle ANSI sequences when parsing toolchain requirements.
  • Cargo.toml workspace lint configuration updated to recognise cfg(kani).

Documentation and testing:

  • New ExecPlan (docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md) documents scope, deliverables, constraints (sidecar-based per ADR 002), and validation gates for items 7.2.4 and 7.2.5.
  • docs/whitaker-clone-detector-design.md extended with implementation-decision section describing Verus/Kani split and external build entry points.
  • docs/developers-guide.md adds "Proof workflows" section documenting Make targets, wrapper scripts, and cache/idempotence behaviour.
  • docs/roadmap.md marks items 7.2.4 and 7.2.5 complete.
  • Test coverage extended: unit tests and rstest-bdd scenarios in min_hash_lsh.feature and min_hash_lsh_behaviour.rs cover zero-bands, zero-rows, invalid product, and symbolic overflow cases.

Review findings

Automated review identified gaps:

  • Verus spec: Functions as a semantic model without linkage to the real implementation (no external_fn_specification), so proofs do not establish conformance; several lemmas are tautological.
  • Kani symbolic harness: Constraints (bands ≤ 128, rows ≤ 128) prevent exercising the checked_mul overflow branch for inputs exceeding MINHASH_SIZE.

The PR author has completed documentation updates; recommendations for Kani overflow harness extension and Verus implementation linkage remain as follow-up items.

Scope

Proof tooling remains opt-in via sidecar scripts and Make targets; runtime API shape, LshConfig location, and workspace lint constraints unchanged per ADR 002. Implementation stages B–I are documented in the ExecPlan; validation includes formatting, linting, unit/BDD tests, and proof runs.

Walkthrough

This pull request adds formal verification infrastructure for the LshConfig::new constructor using both Verus (theorem proving) and Kani (bounded model checking). It introduces three Kani runtime harnesses, a comprehensive Verus specification module, build tooling orchestration via scripts and Make targets, extended test coverage, and supporting documentation marking roadmap items 7.2.4 and 7.2.5 complete.

Changes

Cohort / File(s) Summary
Build configuration and tooling
Cargo.toml, Makefile, scripts/install-verus.sh, scripts/run-kani.sh, scripts/run-verus.sh
Added kani conditional compilation flag recognition to lint configuration; extended Makefile with verus-clone-detector and kani-clone-detector phony targets; refactored run-kani.sh to support clone-detector mode dispatching with dedicated clone-detector harness group and decomposition/common harness separation; added group-based proof-file selection to run-verus.sh for all|decomposition|clone-detector modes; fixed ANSI escape sequence parsing in install-verus.sh.
Kani proof harnesses
crates/whitaker_clones_core/src/index/kani.rs, crates/whitaker_clones_core/src/index/mod.rs
Added three Kani proof functions (verify_lsh_config_new_smoke, verify_lsh_config_new_symbolic, verify_lsh_config_new_overflow_product) verifying LshConfig::new constructor acceptance/rejection contract under smoke test, symbolic input, and integer overflow scenarios; introduced conditional kani module registration.
Verus proof specification
verus/clone_detector_lsh_config.rs
Introduced comprehensive formal specification module defining LshConfigOutcome enum and eight proof lemmas establishing that LshConfig::new accepts exactly when bands > 0 && rows > 0 && bands * rows == 128, and rejects zero-band, zero-row, invalid-product, or overflow cases accordingly.
Test coverage extensions
crates/whitaker_clones_core/src/index/tests.rs, crates/whitaker_clones_core/tests/features/min_hash_lsh.feature, crates/whitaker_clones_core/tests/min_hash_lsh_behaviour.rs
Extended parameterised unit tests with valid (32, 4) case and invalid (3, 42) and (usize::MAX, 2) cases; added two BDD scenarios covering ZeroRows and InvalidBandRowProduct error paths; refactored expected_error function to parse parameterised InvalidBandRowProduct(<bands>,<rows>) error names and extract band/row values.
Documentation and roadmap
docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md, docs/roadmap.md, docs/whitaker-clone-detector-design.md, docs/developers-guide.md
Added comprehensive ExecPlan documenting 7.2.4/7.2.5 deliverables, proof tooling constraints, and validation gates; marked roadmap items 7.2.4 and 7.2.5 as complete; introduced "Implementation decisions" section in design document describing Verus/Kani proof split for constructor invariants; added "Proof workflows" section in developers guide covering Make targets, wrapper scripts, and current proof strategy.

Possibly related PRs

Poem

With Verus proofs and Kani's might, 🔬
The constructor's logic burns so bright,
Zero bands? Rejected outright,
Band-row product matching MINHASH's height—
Overflow caught—verified just right!

🚥 Pre-merge checks | ✅ 6 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Testing ⚠️ Warning Core invariants tested via unit and BDD tests in CI, but shell scripts and proof tooling infrastructure lack automated test coverage. Implement BATS tests for shell scripts and integrate Kani/Verus proof execution into CI as required or documented optional verification steps.
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: adding Verus/Kani proofs and an ExecPlan for LSH invariants (items 7.2.4/7.2.5), which aligns with the comprehensive changeset spanning proof files, harnesses, tooling, tests, and documentation.
Description check ✅ Passed The description comprehensively outlines the scope, implementation stages, artefacts, validation plan, and acceptance criteria—all directly related to the changeset of proof workflows, tooling updates, test extensions, and documentation additions.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
User-Facing Documentation ✅ Passed Proof workflows and Make targets are developer-focused infrastructure appropriately documented in developers-guide.md, not user-facing functionality requiring users-guide.md updates.
Developer Documentation ✅ Passed PR documentation comprehensively covers all new tooling, build requirements, and architectural boundaries with dedicated sections.
Module-Level Documentation ✅ Passed All new modules carry comprehensive module-level docstrings that clearly explain their purpose, utility, and function as required.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch add-verus-proofs-lsh-new-invariants-5sefjo

Comment @coderabbitai help to get the list of available commands and usage tips.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Apr 8, 2026

Reviewer's Guide

Adds Verus and Kani sidecar proof workflows for the clone-detector’s LshConfig::new invariants, including new proof and harness files, grouped proof targets/scripts, extended unit/BDD tests for edge cases, and documentation/roadmap updates to reflect roadmap items 7.2.4 and 7.2.5.

Sequence diagram for kani-clone-detector bounded LshConfig::new verification

sequenceDiagram
  actor Dev
  participant Make as Makefile
  participant RunKani as scripts_run_kani_sh
  participant InstallKani as scripts_install_kani_sh
  participant CargoKani as cargo_kani
  participant Harness as index_kani_rs
  participant Lsh as LshConfig_new

  Dev->>Make: make kani-clone-detector
  Make->>RunKani: ./scripts/run-kani.sh clone-detector
  RunKani->>InstallKani: resolve and install cargo-kani
  InstallKani-->>RunKani: kani_bin path
  RunKani->>CargoKani: cargo-kani --manifest-path ... 
  CargoKani->>Harness: run verify_lsh_config_new_smoke
  Harness->>Lsh: LshConfig::new(32, 4)
  Lsh-->>Harness: Ok(LshConfig)
  Harness-->>CargoKani: assertions over config
  CargoKani->>Harness: run verify_lsh_config_new_symbolic
  Harness->>Harness: choose bands, rows with assumptions
  Harness->>Lsh: LshConfig::new(bands, rows)
  Lsh-->>Harness: Ok or Err(IndexError)
  Harness-->>CargoKani: assertions over result and error shapes
  CargoKani-->>RunKani: verification success/failure
  RunKani-->>Dev: exit status
Loading

Class diagram for LshConfig, IndexError, and new Kani harness module

classDiagram
  class LshConfig {
    +bands() usize
    +rows() usize
    +new(bands usize, rows usize) Result~LshConfig, IndexError~
  }

  class IndexError {
    <<enum>>
    ZeroBands
    ZeroRows
    InvalidBandRowProduct
    EmptyFingerprintSet
  }

  class MINHASH_SIZE {
    <<const>>
    +VALUE usize
  }

  class index_kani {
    <<module>>
    +verify_lsh_config_new_smoke()
    +verify_lsh_config_new_symbolic()
  }

  LshConfig --> IndexError : returns
  LshConfig ..> MINHASH_SIZE : uses

  index_kani ..> LshConfig : calls_new_and_accessors
  index_kani ..> IndexError : pattern_matches
  index_kani ..> MINHASH_SIZE : uses
Loading

File-Level Changes

Change Details Files
Group Verus proof entry points and add a dedicated clone-detector proof file for LshConfig::new invariants.
  • Refactor run-verus.sh to support proof file groups (decomposition, clone-detector, all) and selectable groups via CLI argument.
  • Default Verus runs now include the new clone-detector proof file alongside existing decomposition proofs.
  • Introduce verus/clone_detector_lsh_config.rs that semantically models LshConfig::new and proves acceptance only for positive bands/rows whose product equals MINHASH_SIZE, plus rejection lemmas for zero and invalid non-zero products.
scripts/run-verus.sh
verus/clone_detector_lsh_config.rs
Introduce Kani sidecar workflow and harnesses for LshConfig::new, wired through scripts and Makefile targets.
  • Add scripts/install-kani.sh to install a pinned kani-verifier into a cache dir and expose cargo-kani path, running kani setup idempotently.
  • Add scripts/run-kani.sh wrapper that runs the clone-detector Kani harnesses in the whitaker_clones_core crate, with a default-unwind bound.
  • Create crates/whitaker_clones_core/src/index/kani.rs with a smoke harness and a symbolic harness around LshConfig::new, asserting alignment with the documented contract and error variants.
  • Expose the kani module behind cfg(kani) from the index mod and allow cfg(kani) via the workspace unexpected_cfgs lint configuration.
  • Add Makefile targets verus-clone-detector, kani, and kani-clone-detector that call the new scripts with appropriate groups.
scripts/install-kani.sh
scripts/run-kani.sh
crates/whitaker_clones_core/src/index/kani.rs
crates/whitaker_clones_core/src/index/mod.rs
Cargo.toml
Makefile
Tighten and extend tests and BDD scenarios to cover LshConfig::new edge cases and richer error expectations.
  • Extend unit tests for invalid LSH configs to cover non-zero invalid product (3,42) and an overflow edge (usize::MAX, 2), and add a new valid case (32,4).
  • Allow parsing of parameterised InvalidBandRowProduct(bands,rows) errors in the MinHash LSH behaviour test helper instead of a single fixed instance.
  • Add new rstest-bdd scenarios for zero rows and invalid non-zero product in the behaviour test file, and reference them from the feature file with appropriate Given/Then steps.
crates/whitaker_clones_core/src/index/tests.rs
crates/whitaker_clones_core/tests/min_hash_lsh_behaviour.rs
crates/whitaker_clones_core/tests/features/min_hash_lsh.feature
Document the proof-workflow split and mark roadmap items 7.2.4 and 7.2.5 as complete, plus harden Verus installer output parsing.
  • Add an ExecPlan document that explains scope, constraints, staged implementation, risks, decisions, and validation for adding clone-detector proof sidecars and the LshConfig::new proof.
  • Update whitaker-clone-detector-design.md with a new section capturing implementation decisions for 7.2.4/7.2.5, including Verus vs Kani responsibilities, sidecar targets, LshConfig contract modeling, Kani harness placement, and overflow handling.
  • Mark roadmap items 7.2.4 and 7.2.5 as done in docs/roadmap.md once implementation is in place.
  • Strip ANSI escape codes from Verus installer output before extracting the suggested rustup toolchain version in scripts/install-verus.sh.
docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md
docs/whitaker-clone-detector-design.md
docs/roadmap.md
scripts/install-verus.sh

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos leynos changed the title Plan Verus proofs for LshConfig::new invariants (7.2.4/7.2.5) Add Verus proof and Kani harness for LshConfig::new invariants Apr 9, 2026
@leynos leynos marked this pull request as ready for review April 10, 2026 01:04
@coderabbitai coderabbitai bot added the Roadmap label Apr 10, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 862087e28e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/run-kani.sh Outdated
Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="scripts/run-kani.sh" line_range="9-15" />
<code_context>
+KANI_BIN=$("${SCRIPT_DIR}/install-kani.sh")
+MANIFEST_PATH="${REPO_ROOT}/crates/whitaker_clones_core/Cargo.toml"
+
+run_clone_detector_harnesses() {
+    "${KANI_BIN}" \
+        --manifest-path "${MANIFEST_PATH}" \
+        --default-unwind 4 \
+        --harness verify_lsh_config_new_smoke \
+        --harness verify_lsh_config_new_symbolic
+}
+
+if [ $# -eq 0 ]; then
+    run_clone_detector_harnesses
+    exit 0
+fi
+
+case "$1" in
+    clone-detector)
+        shift
+        run_clone_detector_harnesses "$@"
+        ;;
+    *)
</code_context>
<issue_to_address>
**suggestion:** The helper ignores forwarded arguments, which is slightly misleading given the `"$@"` in the call site.

`run_clone_detector_harnesses` takes no parameters, but in the `clone-detector` branch you call it as `run_clone_detector_harnesses "$@"`, which is a no-op and potentially confusing. Either remove `"$@"` at the call site, or update `run_clone_detector_harnesses` to accept and forward options (e.g., extra `kani` flags) if that’s the intended design.

```suggestion
run_clone_detector_harnesses() {
    # Forward any additional arguments as extra Kani flags.
    "${KANI_BIN}" \
        --manifest-path "${MANIFEST_PATH}" \
        --default-unwind 4 \
        --harness verify_lsh_config_new_smoke \
        --harness verify_lsh_config_new_symbolic \
        "$@"
}
```
</issue_to_address>

### Comment 2
<location path="docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md" line_range="22" />
<code_context>
+This work matters because roadmap item 7.2.2 already made `LshConfig` a hard
+runtime boundary for MinHash and locality-sensitive hashing (LSH). If that
+boundary drifts, every later proof and every later token-pass candidate check
+inherits the bug. ADR 002 explicitly assigns this constructor to Verus and
+reserves Kani for bounded behavioural checks over the same clone-detector
+crate, so 7.2.4 and 7.2.5 establish the proof split that later items 7.2.6,
</code_context>
<issue_to_address>
**issue (review_instructions):** ADR is introduced as an acronym without being expanded on first use.

Please expand ADR on its first appearance, for example, "Architecture Decision Record (ADR) 002 explicitly assigns this constructor to Verus…" to satisfy the acronym definition requirement.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.md`

**Instructions:**
Define uncommon acronyms on first use.

</details>
</issue_to_address>

### Comment 3
<location path="docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md" line_range="182" />
<code_context>
+  `crates/whitaker_clones_core/tests/features/min_hash_lsh.feature` already
+  exercises zero bands. The implementation should extend these tests rather
+  than replace them.
+- `Makefile` currently exposes only a global `verus` target. If we want fast
+  clone-detector-only iteration without regressing the existing proof workflow,
+  the cleanest approach is to add clone-detector-specific targets alongside the
</code_context>
<issue_to_address>
**issue (review_instructions):** The sentence uses the first-person plural pronoun "we", which the style guide forbids.

To comply with the pronoun rule, consider rephrasing this line, for example: "If fast clone-detector-only iteration is desired, the cleanest approach is to add clone-detector-specific targets alongside the current umbrella target."

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.md`

**Instructions:**
Avoid 2nd person or 1st person pronouns ("I", "you", "we").

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread scripts/run-kani.sh
Comment thread docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md Outdated
Comment thread docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md`:
- Line 321: Replace the two-word phrase "check list" with the single word
"checklist" in the sentence that references the crate's `unexpected_cfgs` check
list so it reads "crate's `unexpected_cfgs` checklist"; locate the occurrence
near the text mentioning `unexpected_cfgs` and update the spelling accordingly.
- Line 311: Replace the British spelling "specialised" with the Oxford English
Dictionary preferred "-ize" form "specialized" in the sentence referencing
`unexpected_cfgs` considerations (the phrase "`unexpected_cfgs` considerations
when introducing specialised build modes."). Update that exact token so the doc
uses "specialized" while keeping the surrounding wording intact.

In `@scripts/run-kani.sh`:
- Around line 22-26: The case branch calls run_clone_detector_harnesses with an
unused argument passthrough ("$@"); remove the "$@" so the call is simply
run_clone_detector_harnesses. Update the case block where
run_clone_detector_harnesses is invoked to stop forwarding parameters and ensure
no other callers rely on arguments being passed from this branch.

In `@scripts/run-verus.sh`:
- Line 33: The array assignment using command substitution for PROOF_FILES is
unsafe and triggers SC2207; replace both occurrences where PROOF_FILES is set
from proof_files_for_group (the global case and the case inside the
option-handling branch) to use mapfile -t with process substitution: run
proof_files_for_group through process substitution and load into PROOF_FILES
with mapfile -t to avoid word-splitting and globbing issues (use mapfile -t
PROOF_FILES < <(proof_files_for_group all) for the top-level assignment and
mapfile -t PROOF_FILES < <(proof_files_for_group "$1") for the branch).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b90765d1-53b6-481a-a9cc-16fc2fb38383

📥 Commits

Reviewing files that changed from the base of the PR and between 221fdbd and 862087e.

📒 Files selected for processing (15)
  • Cargo.toml
  • Makefile
  • crates/whitaker_clones_core/src/index/kani.rs
  • crates/whitaker_clones_core/src/index/mod.rs
  • crates/whitaker_clones_core/src/index/tests.rs
  • crates/whitaker_clones_core/tests/features/min_hash_lsh.feature
  • crates/whitaker_clones_core/tests/min_hash_lsh_behaviour.rs
  • docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md
  • docs/roadmap.md
  • docs/whitaker-clone-detector-design.md
  • scripts/install-kani.sh
  • scripts/install-verus.sh
  • scripts/run-kani.sh
  • scripts/run-verus.sh
  • verus/clone_detector_lsh_config.rs

Comment thread docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md Outdated
Comment thread docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md Outdated
Comment thread scripts/run-verus.sh Outdated
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos leynos changed the title Add Verus proof and Kani harness for LshConfig::new invariants Plan Verus/Kani proofs and tooling for LshConfig::new Apr 10, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md`:
- Around line 213-216: Replace the phrase "sidecar proof tooling independent
from ordinary Cargo development" with "sidecar proof tooling independent of
ordinary Cargo development" in the sentence that mentions Makefile targets and
wrapper scripts and references ADR 002 (the sentence containing "sidecar proof
tooling independent from ordinary Cargo"); ensure the docs/markdown uses
en-GB-oxendict spelling and save the updated line.

In `@scripts/run-kani.sh`:
- Around line 27-31: The case branch for "clone-detector" drops all remaining
CLI flags by calling run_clone_detector_harnesses without arguments; change the
invocation in that branch to forward the shifted arguments to
run_clone_detector_harnesses (i.e., call run_clone_detector_harnesses "$@") so
the function (defined as run_clone_detector_harnesses) receives and can forward
the flags to the Kani harness execution.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 20c34ef5-409e-406d-aaa3-8c5cec9faef8

📥 Commits

Reviewing files that changed from the base of the PR and between 862087e and d372910.

📒 Files selected for processing (3)
  • docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md
  • scripts/run-kani.sh
  • scripts/run-verus.sh

Comment thread docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md
Comment thread scripts/run-kani.sh
@leynos leynos changed the title Plan Verus/Kani proofs and tooling for LshConfig::new Add Verus clone-detector proof and Kani harness for LshConfig::new Apr 10, 2026
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 11, 2026

@coderabbitai Have the following now been resolved?

Annul any requirements that violate the en-GB-oxendict spelling (-ize / -yse / -our) conventions (for example a request to replace "normalize" with "normalise" or "artefact" with "artifact"), or where the requirement unnecessarily increases cyclomatic complexity.

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Developer Documentation ⚠️ Warning New tooling and build requirements introduced in this PR are not documented in docs/developers-guide.md, the primary reference for developers. Update docs/developers-guide.md to add a new section documenting the proof workflow tooling, including Makefile targets, scripts, Kani installation requirements, and examples.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 11, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if docs/developers-guide.md mentions kani, verus, or proof workflow tooling
echo "=== Searching for kani/verus/proof references in docs/developers-guide.md ==="
fd -i 'developers-guide.md' --exec grep -in 'kani\|verus\|proof\|run-kani\|run-verus\|install-kani\|kani-clone\|verus-clone' {} \;

echo "=== Last 60 lines of docs/developers-guide.md ==="
fd -i 'developers-guide.md' --exec tail -n 60 {} \;

Length of output: 4266


@leynos: Yes — the warning is resolved. docs/developers-guide.md now contains a "Proof workflows" section (line 64 onwards) that covers all the required material:

  • Makefile targetsmake verus, make verus-clone-detector, make kani, and make kani-clone-detector are all documented with a short prose explanation of what each group currently runs.
  • Scriptsscripts/install-verus.sh, scripts/run-verus.sh, scripts/install-kani.sh, and scripts/run-kani.sh are each described, including their cache locations and idempotency guarantee.
  • Kani requirements — a dedicated subsection explains the host-toolchain dependency, the automatic setup performed by scripts/install-kani.sh, and where the installed binary lives for direct inspection.
  • Examples — direct invocation examples for both the Make targets and the underlying wrapper scripts are provided.

Nothing further is needed for this check.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 11, 2026

@coderabbitai are you satisfied that the introduced proof is substantive, rigorous, and well-founded?

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 11, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== verus/clone_detector_lsh_config.rs ==="
fd -g 'clone_detector_lsh_config.rs' --exec cat {} \;

echo ""
echo "=== crates/.../index/kani.rs ==="
fd -g 'kani.rs' --exec cat {} \;

echo ""
echo "=== crates/.../index/lsh.rs (LshConfig::new implementation) ==="
fd -g 'lsh.rs' --exec cat {} \;

echo ""
echo "=== crates/.../index/error.rs ==="
fd -g 'error.rs' -p 'whitaker_clones_core' --exec cat {} \;

Length of output: 7587


🏁 Script executed:

#!/bin/bash
echo "=== LshConfig struct and new() implementation ==="
rg -n 'struct LshConfig|impl LshConfig|fn new' --type rust -A 20 -B 2

echo ""
echo "=== IndexError enum ==="
rg -n 'enum IndexError' --type rust -A 30

echo ""
echo "=== MINHASH_SIZE constant ==="
rg -n 'MINHASH_SIZE' --type rust -C 2

Length of output: 146186


@leynos: Having read both the proof artefacts in full alongside the real LshConfig::new implementation and validate_product, here is my assessment.


Kani — substantive and well-founded ✅

verify_lsh_config_new_symbolic calls the real LshConfig::new with symbolic usize inputs and exhausts all match arms, including the EmptyFingerprintSet unreachability claim. Within its input bounds (bands ≤ 128, rows ≤ 128) the harness provides strong bounded-model-checking guarantees that the implementation satisfies its contract.

One genuine gap: inputs where bands > 128 or rows > 128 are excluded by the kani::assume guards. The overflow branch inside validate_product (where checked_mul returns None) is therefore never exercised. A harness without those upper-bound assumptions — relying on the model checker to verify the None branch via checked_mul — would close that gap.


Verus — self-referential and implementation-free ⚠️

The Verus proof is not well-founded as a proof of the implementation. Three specific concerns:

  1. No linkage to the real code. There is no external_fn_specification, external_body, or equivalent mechanism connecting lsh_config_accepts to LshConfig::new. The proof verifies only the internal consistency of an abstract spec; it makes no claim about the actual Rust function.

  2. Tautological lemmas. lemma_acceptance_matches_documented_contract asserts:

    lsh_config_accepts(bands, rows) <==> (bands > 0 && rows > 0 && bands * rows == minhash_size())
    

    Since lsh_config_accepts is defined as exactly that RHS, this is X <==> X. Verus discharges it with an empty proof body — no proof work is done. Similarly, lemma_zero_bands_rejected and lemma_zero_rows_rejected are immediately true from the definition; substituting 0 for bands or rows falsifies the > 0 guard instantly.

  3. The substantive lemmas are finite arithmetic checks. lemma_exact_product_is_accepted and lemma_invalid_non_zero_product_is_rejected use by (compute) to verify that 32*4 == 128, 1*128 == 128, 16*16 ≠ 128, and 3*42 ≠ 128. These are correct but trivial ground-term computations.

The PR documentation does describe the Verus artefact as a "semantic model" rather than an implementation proof, so this is a known design choice rather than an oversight. However, it is worth being explicit: the Verus file documents the intended contract; it does not verify that LshConfig::new upholds it.


Summary

Artefact Calls real code Symbolic Proof content Gap
Verus Tautological over spec definition No implementation linkage
Kani smoke Single concrete path Narrow
Kani symbolic Full match exhaustion over [0,128]² Overflow inputs uncovered

The Kani harness is the only artefact that actually verifies the implementation, and it does so substantively within its declared bounds. Two actions would strengthen the overall proof story:

  1. Add a Kani harness for unconstrained bands and rows (removing the <= MINHASH_SIZE assumptions) to cover the checked_mul overflow path in validate_product.
  2. Either extend the Verus proof to link the spec to the implementation via external_fn_specification, or document clearly in the design notes that the Verus file is a contract record rather than an implementation proof (the latter already appears partially in the design doc but could be stated more plainly alongside the tautology concern).

@leynos leynos changed the title Add Verus clone-detector proof and Kani harness for LshConfig::new Plan Verus Proofs Workflow and Tests Apr 11, 2026
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/developers-guide.md`:
- Around line 125-127: Replace the phrase "cached install" with "cached
installation" in the sentence "The first proof run may take longer while
toolchains and verifier binaries are downloaded; later runs reuse the cached
install." to correct the noun usage and conform to en-GB-oxendict
spelling/grammar in docs; update the sentence to read "...later runs reuse the
cached installation." ensuring consistency in docs/*.md.

In `@docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md`:
- Line 7: Update the top matter "Status" field from "IN PROGRESS" to "COMPLETED"
so the document header matches the body (all stages checked in the stages
section and the retrospective records finished outcomes); specifically change
the Status value on the header line that currently reads "Status: IN PROGRESS"
to "Status: COMPLETED".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7f051490-5299-4e5f-bac2-bb955a8cb246

📥 Commits

Reviewing files that changed from the base of the PR and between d372910 and 6ae12a1.

📒 Files selected for processing (6)
  • crates/whitaker_clones_core/src/index/kani.rs
  • docs/developers-guide.md
  • docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md
  • docs/whitaker-clone-detector-design.md
  • scripts/run-kani.sh
  • verus/clone_detector_lsh_config.rs

Comment thread docs/developers-guide.md Outdated
Comment thread docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md Outdated
@devboxerhub devboxerhub bot force-pushed the add-verus-proofs-lsh-new-invariants-5sefjo branch from 6ae12a1 to 5fb3af2 Compare April 13, 2026 00:17
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos leynos changed the title Plan Verus Proofs Workflow and Tests Add Verus/Kani proof workflow, harnesses, and ExecPlan Apr 13, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md (1)

7-7: ⚠️ Potential issue | 🟡 Minor

Set the ExecPlan status to COMPLETED.

Update Line 7 to match the completed stages and retrospective.

Triage: [type:docstyle]

📝 Apply status fix
-Status: IN PROGRESS
+Status: COMPLETED

As per coding guidelines, documentation in docs/ must remain accurate and
current when implementation state changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md` at line
7, Change the ExecPlan front-matter/status line from "Status: IN PROGRESS" to
"Status: COMPLETED" and update the stage and retrospective text on that document
to reflect the completed work; locate the "Status" header near the top of the
document (line containing "Status: IN PROGRESS") and replace it with "Status:
COMPLETED", then ensure any subsequent sections that list stages, completed
steps, or the retrospective accurately reflect completion.
docs/developers-guide.md (1)

125-127: ⚠️ Potential issue | 🟡 Minor

Replace “cached install” with “cached installation”.

Fix the noun usage at Line 127.

Triage: [type:grammar]

📝 Apply wording fix
-The installer scripts are idempotent. The first proof run may take longer while
-toolchains and verifier binaries are downloaded; later runs reuse the cached
-install.
+The installer scripts are idempotent. The first proof run may take longer while
+toolchains and verifier binaries are downloaded; later runs reuse the cached
+installation.

As per coding guidelines, docs/**/*.md content must use en-GB-oxendict
spelling and grammar.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/developers-guide.md` around lines 125 - 127, Replace the phrase "cached
install" with "cached installation" in the sentence starting "The first proof
run may take longer..." so the line reads "...later runs reuse the cached
installation."; ensure the docs/developers-guide.md change follows
en-GB-oxendict spelling/grammar conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/whitaker_clones_core/tests/min_hash_lsh_behaviour.rs`:
- Around line 50-60: The parsed tokens from arguments.split_once(',') (the local
variables bands and rows) should be trimmed before calling parse::<usize>() to
avoid failures from surrounding whitespace; update the code that constructs
bands and rows (the let bands = ... and let rows = ... expressions) to call
.trim() on each token prior to parse and keep the existing map_err messages
unchanged.

In `@scripts/run-verus.sh`:
- Line 33: Replace use of the Bash-4-only mapfile plus process substitution with
a Bash-3.2-compatible loader: add a helper function named
set_proof_files_for_group() that initializes PROOF_FILES=() and populates it by
piping proof_files_for_group "$1" into a while IFS= read -r proof_file; do
PROOF_FILES+=("${proof_file}"); done loop, then replace the two occurrences of
"mapfile -t PROOF_FILES < <(proof_files_for_group ...)" with calls to
set_proof_files_for_group all and set_proof_files_for_group "$1" respectively so
PROOF_FILES is populated without using mapfile or process substitution.

---

Duplicate comments:
In `@docs/developers-guide.md`:
- Around line 125-127: Replace the phrase "cached install" with "cached
installation" in the sentence starting "The first proof run may take longer..."
so the line reads "...later runs reuse the cached installation."; ensure the
docs/developers-guide.md change follows en-GB-oxendict spelling/grammar
conventions.

In `@docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md`:
- Line 7: Change the ExecPlan front-matter/status line from "Status: IN
PROGRESS" to "Status: COMPLETED" and update the stage and retrospective text on
that document to reflect the completed work; locate the "Status" header near the
top of the document (line containing "Status: IN PROGRESS") and replace it with
"Status: COMPLETED", then ensure any subsequent sections that list stages,
completed steps, or the retrospective accurately reflect completion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 615f7307-61b4-45e3-a228-6f499d89bcae

📥 Commits

Reviewing files that changed from the base of the PR and between 6ae12a1 and 5fb3af2.

📒 Files selected for processing (16)
  • Cargo.toml
  • Makefile
  • crates/whitaker_clones_core/src/index/kani.rs
  • crates/whitaker_clones_core/src/index/mod.rs
  • crates/whitaker_clones_core/src/index/tests.rs
  • crates/whitaker_clones_core/tests/features/min_hash_lsh.feature
  • crates/whitaker_clones_core/tests/min_hash_lsh_behaviour.rs
  • docs/developers-guide.md
  • docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md
  • docs/roadmap.md
  • docs/whitaker-clone-detector-design.md
  • scripts/install-kani.sh
  • scripts/install-verus.sh
  • scripts/run-kani.sh
  • scripts/run-verus.sh
  • verus/clone_detector_lsh_config.rs

Comment thread crates/whitaker_clones_core/tests/min_hash_lsh_behaviour.rs
Comment thread scripts/run-verus.sh Outdated
@devboxerhub devboxerhub bot force-pushed the add-verus-proofs-lsh-new-invariants-5sefjo branch from 5fb3af2 to 4a2d47a Compare April 14, 2026 20:32
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
scripts/run-kani.sh (1)

57-58: ⚠️ Potential issue | 🔴 Critical

Remove the duplicate closing brace on line 58.

Line 58 contains an extra } that causes a shell parse error. The function run_clone_detector_harnesses is already closed by the brace on line 57.

🐛 Proposed fix
     done
 }
-}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/run-kani.sh` around lines 57 - 58, There is a duplicate closing brace
after the run_clone_detector_harnesses function causing a shell parse error;
remove the extra `}` that immediately follows the existing closing brace for
run_clone_detector_harnesses so the function is closed only once and the script
parses correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/whitaker-clone-detector-design.md`:
- Line 550: Edit the sentence fragment that currently reads "Together these
harnesses assert the" to insert a comma after "Together" so it reads "Together,
these harnesses assert the" to improve readability; locate the string "Together
these harnesses assert the" in the document and update it accordingly.

In `@Makefile`:
- Line 1: Remove the duplicate 'kani' entry from the .PHONY declaration: edit
the .PHONY line that currently lists targets including 'kani' twice and delete
the redundant 'kani' token so each phony target appears only once (look for the
.PHONY line containing help all clean test build release lint fmt check-fmt ...
kani kani-clone-detector and remove the extra 'kani').

---

Duplicate comments:
In `@scripts/run-kani.sh`:
- Around line 57-58: There is a duplicate closing brace after the
run_clone_detector_harnesses function causing a shell parse error; remove the
extra `}` that immediately follows the existing closing brace for
run_clone_detector_harnesses so the function is closed only once and the script
parses correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 398b1895-7c56-4ae4-abf1-4cc73752c81b

📥 Commits

Reviewing files that changed from the base of the PR and between 5fb3af2 and 4a2d47a.

📒 Files selected for processing (16)
  • Cargo.toml
  • Makefile
  • crates/whitaker_clones_core/src/index/kani.rs
  • crates/whitaker_clones_core/src/index/mod.rs
  • crates/whitaker_clones_core/src/index/tests.rs
  • crates/whitaker_clones_core/tests/features/min_hash_lsh.feature
  • crates/whitaker_clones_core/tests/min_hash_lsh_behaviour.rs
  • docs/developers-guide.md
  • docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md
  • docs/roadmap.md
  • docs/whitaker-clone-detector-design.md
  • installer/src/main.rs
  • scripts/install-verus.sh
  • scripts/run-kani.sh
  • scripts/run-verus.sh
  • verus/clone_detector_lsh_config.rs

Comment thread docs/whitaker-clone-detector-design.md Outdated
Comment thread Makefile Outdated
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos leynos changed the title Add Verus/Kani proof workflow, harnesses, and ExecPlan Add Verus/Kani proofs and ExecPlan for LSH invariants (7.2.4/7.2.5) Apr 15, 2026
leynos and others added 8 commits April 15, 2026 16:23
… roadmap 7.2.4-7.2.5

Add a detailed, living ExecPlan document outlining the implementation, constraints, risks, progress stages, and acceptance criteria for roadmap items 7.2.4 and 7.2.5. This plan covers establishing reusable proof plumbing for the clone-detector token pipeline, proving constructor invariants for LshConfig::new, integrating Verus and Kani sidecar proof tooling, and augmenting test and documentation coverage. It serves as the authoritative blueprint governing the scoped proof split and workflow before implementation begins.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
…plicit bounds and cfg handling

- Introduce a minimal Kani smoke harness for LshConfig with symbolic but bounded inputs.
- Follow Chutoro patterns: keep harnesses close to code, guard with #[cfg(kani)], and document usage.
- Update unexpected_cfgs allowlist for cfg(kani) to prevent lint warnings.
- Extend Makefile and scripts to run practical Kani targets, leaving room for future exhaustive checks.
- Add detailed docs on harness design, unwinding limits, and state-explosion mitigation strategies.

This supports roadmap item 7.2.4 by providing a practical local smoke gate for Kani proofs without yet implementing heavier exhaustive runs planned for later roadmap items.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
…:new

- Added a Verus proof module `verus/clone_detector_lsh_config.rs` that semantically verifies the contract of `LshConfig::new` ensuring positive bands and rows with product equal to MINHASH_SIZE.
- Introduced a Kani harness in `crates/whitaker_clones_core/src/index/kani.rs` for bounded symbolic testing of `LshConfig::new` under `cfg(kani)`.
- Extended rspec-based test scenarios and unit tests to cover zero rows, invalid non-zero products, and overflow cases.
- Added supporting Makefile targets and scripts (`make kani-clone-detector`, `make verus-clone-detector`, install/run scripts) for sidecar proof workflows.
- Updated documentation and roadmap to reflect implementation progress on clone-detector proof workflows and LshConfig invariants.
- Improved shared tooling in `scripts/install-verus.sh` to strip ANSI escape codes for stable Verus installation.

This change establishes a formal verification foundation for clone-detector LSH configuration, enabling future proofs and bounded runtime checks.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
…iteration

- Added `make verus-clone-detector` and `make kani-clone-detector` targets
  for targeted proof runs without regressing existing workflow.
- Refactored `scripts/run-kani.sh` to run clone-detector harnesses in a loop,
  improving maintainability.
- Updated `scripts/run-verus.sh` to read proof files efficiently with `mapfile`.
- Fixed shared Verus installer to strip ANSI escape codes from toolchain hints,
  ensuring consistent fallback handling across targets.
- Adjusted documentation in execplans to align with new proof sidecar workflow and
  clarify architecture decisions.

These changes enhance developer efficiency by supporting focused proof
workflow iterations and improving tooling robustness.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
…ing scripts

- Document new Verus and Kani proof make targets and usage
- Explain repository-managed proof tooling and scripts
- Provide instructions for Kani installation and execution
- Include example commands for running proofs
- Fix minor typo in execplans markdown regarding Cargo

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
…s LshConfig proofs

- Added `verify_lsh_config_new_overflow_product` Kani harness to drive overflow path in LshConfig::new.
- Updated Kani harness set to include the overflow-specific proof.
- Extended Verus proof to mirror runtime branch order including overflow rejection.
- Documented the distinction between Verus model and runtime implementation in developer guide.
- Clarified that Verus proves an implementation-shaped model, while Kani executes the real constructor code.
- Improved proofs for zero bands, zero rows, exact product acceptance, invalid product rejection, and overflow in Verus.

This enhances formal coverage of LshConfig constructor ensuring overflow paths are verified without trusted assumptions.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
Apply the remaining proof-workflow review fixes by trimming parsed LSH\ninputs, updating proof documentation wording and status, and\nreplacing Bash-4-only mapfile usage in the Verus wrapper.\n\nAlso make the installer bin tests deterministic again by routing\ntheir Dylint dependency checks through injected test-only install\noptions instead of the live repository-first installer path. This\nkeeps the validated worktree stable before rebasing onto main.
The script scripts/run-kani.sh had an unnecessary closing brace '}' removed to fix shell syntax correctness.

Additionally, a minor punctuation fix was applied in docs/whitaker-clone-detector-design.md, and the Makefile's .PHONY declaration was cleaned up by removing duplicate 'kani' target.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
@devboxerhub devboxerhub bot force-pushed the add-verus-proofs-lsh-new-invariants-5sefjo branch from 512d978 to 6705a99 Compare April 15, 2026 16:28
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@verus/clone_detector_lsh_config.rs`:
- Around line 8-12: The sidecar currently proves lsh_config_new_result but is
not linked to the runtime constructor LshConfig::new (nor to validate_product),
so the proof is disconnected; either add Verus linkage by declaring an
external_fn_specification or external_body that connects lsh_config_new_result
to LshConfig::new (and any validate_product calls used by the constructor),
ensuring the sidecar’s spec references the same function signature and types, or
alternatively demote the module to a pure contract record by removing the lemma
and using a record/abstract spec that does not pretend to verify the concrete
constructor; update references to LshConfig::new, validate_product, and
lsh_config_new_result accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 03e899d7-ff8a-423f-aa73-5836d91e47af

📥 Commits

Reviewing files that changed from the base of the PR and between 4a2d47a and 6705a99.

📒 Files selected for processing (15)
  • Cargo.toml
  • Makefile
  • crates/whitaker_clones_core/src/index/kani.rs
  • crates/whitaker_clones_core/src/index/mod.rs
  • crates/whitaker_clones_core/src/index/tests.rs
  • crates/whitaker_clones_core/tests/features/min_hash_lsh.feature
  • crates/whitaker_clones_core/tests/min_hash_lsh_behaviour.rs
  • docs/developers-guide.md
  • docs/execplans/7-2-4-verus-proofs-for-lsh-config-new-invariants.md
  • docs/roadmap.md
  • docs/whitaker-clone-detector-design.md
  • scripts/install-verus.sh
  • scripts/run-kani.sh
  • scripts/run-verus.sh
  • verus/clone_detector_lsh_config.rs

Comment thread verus/clone_detector_lsh_config.rs
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 17, 2026

@coderabbitai Please suggest a fix for the following concern and provide an AI coding agent prompt for the fix:

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Testing ⚠️ Warning Core invariants tested via unit and BDD tests in CI, but shell scripts and proof tooling infrastructure lack automated test coverage. Implement BATS tests for shell scripts and integrate Kani/Verus proof execution into CI as required or documented optional verification steps.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 17, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== CI workflow files ==="
fd -e yml -e yaml -p '.github/workflows' --exec cat {} \;

echo ""
echo "=== Existing BATS usage ==="
fd -e bats -e bash -g '*.bats' --exec echo "Found: {}" \;
rg -l 'bats' --type yaml --type bash 2>/dev/null || true
fd -g 'bats' 2>/dev/null | head -20

echo ""
echo "=== scripts directory tree ==="
tree scripts/ 2>/dev/null || ls -la scripts/

echo ""
echo "=== run-kani.sh ==="
cat scripts/run-kani.sh 2>/dev/null || fd -g 'run-kani.sh' --exec cat {} \;

echo ""
echo "=== run-verus.sh ==="
cat scripts/run-verus.sh 2>/dev/null || fd -g 'run-verus.sh' --exec cat {} \;

echo ""
echo "=== install-kani.sh ==="
cat scripts/install-kani.sh 2>/dev/null || fd -g 'install-kani.sh' --exec cat {} \;

echo ""
echo "=== Makefile (relevant targets) ==="
grep -n 'kani\|verus\|bats\|test' Makefile 2>/dev/null | head -40

echo ""
echo "=== Any existing test/ or tests/ at repo root ==="
ls -la tests/ 2>/dev/null || echo "no root tests dir"
fd -g 'test*' --max-depth 2 -t d 2>/dev/null | head -10

Length of output: 12779


@leynos: Here is a concrete fix plan and a ready-to-use AI coding agent prompt.


Fix overview

The warning has two distinct sub-problems:

Sub-problem Fix
Shell scripts lack automated tests Add BATS unit tests under tests/scripts/ that stub the heavy toolchain binaries and verify the scripts' argument-dispatch and error-handling logic
Proof tooling has no CI gate Add an opt-in GitHub Actions workflow (triggered by workflow_dispatch and a weekly schedule) that installs the sidecar tools and runs make kani-clone-detector and make verus-clone-detector

What BATS tests should cover

tests/scripts/run-verus.bats

  • proof_files_for_group decomposition emits the two decomposition proof files
  • proof_files_for_group clone-detector emits verus/clone_detector_lsh_config.rs
  • proof_files_for_group all emits all three files
  • An unknown group name exits with status 1 and an error message on stderr
  • Passing clone-detector as the first positional argument restricts PROOF_FILES to the clone-detector set before invoking the stub verus binary
  • Passing an explicit file path bypasses group selection

tests/scripts/run-kani.bats

  • No arguments → both run_decomposition_harnesses and run_clone_detector_harnesses are called
  • clone-detector → only the three clone-detector harnesses are invoked, each with --harness <name>
  • decomposition / common → only the decomposition path is taken
  • Unknown first argument → treated as a decomposition harness filter (falls through to default case)

tests/scripts/install-kani.bats

  • When the kani-driver binary already exists at the expected cache path, curl and tar are not called (idempotency)
  • platform_asset_suffix emits the correct suffix for each uname -s/uname -m combination
  • An unsupported platform exits 1 with a diagnostic on stderr

AI coding agent prompt

## Task: Add BATS tests for proof-tooling shell scripts and a CI proof workflow

### Context
Repository: whitaker (Rust workspace)
Branch: PR `#194` branch

The following shell scripts under `scripts/` have no automated test coverage:
- `scripts/run-kani.sh`   — dispatches to clone-detector or decomposition Kani harness groups
- `scripts/run-verus.sh`  — dispatches to proof-file groups via `proof_files_for_group`
- `scripts/install-kani.sh` — idempotent download/cache of a pinned Kani release

There is no CI gate that runs `make kani-clone-detector` or `make verus-clone-detector`.

### Spelling/style constraints
- Use en-GB-oxendict spelling throughout all prose and comments (-ise/-yse/-our).
- Do NOT increase cyclomatic complexity unnecessarily.

### Requirements

#### 1. BATS test infrastructure

1.1  Add a `tests/scripts/` directory with a `README.md` explaining that BATS must be
     installed (`brew install bats-core` / `apt-get install bats`) to run these tests.

1.2  Add `tests/scripts/run-verus.bats` covering:
     - `proof_files_for_group decomposition` prints exactly the two decomposition `.rs` paths.
     - `proof_files_for_group clone-detector` prints exactly `verus/clone_detector_lsh_config.rs`.
     - `proof_files_for_group all` prints all three paths.
     - An unknown group name exits with status 1 and writes to stderr.
     - Invoking `scripts/run-verus.sh clone-detector` with a stub `verus` binary (one that
       records its arguments and exits 0) calls that stub with only the clone-detector proof
       file and no other files.
     - Invoking `scripts/run-verus.sh <explicit-path>` passes that exact path to the stub.

1.3  Add `tests/scripts/run-kani.bats` covering:
     - No arguments: the stub `cargo-kani` is called for each of the three clone-detector
       harnesses AND for the decomposition group.
     - `clone-detector`: only the three named `--harness` invocations occur.
     - `decomposition` or `common`: only the decomposition path is taken.
     - An unknown first argument is treated as a decomposition harness filter.

1.4  Add `tests/scripts/install-kani.bats` covering:
     - When `kani-driver` already exists at the expected cache path, neither `curl` nor `tar`
       is executed (idempotency).
     - `platform_asset_suffix` emits `x86_64-unknown-linux-gnu` on Linux x86_64.
     - An unsupported `uname` combination exits 1.

     Use BATS helper libraries `bats-support` and `bats-assert` (submodule or vendored under
     `tests/scripts/test_helper/`). Stub external tools by prepending a `$BATS_TEST_TMPDIR/bin`
     directory containing shell stubs onto `PATH`; do not mock via `eval` or global overrides.

#### 2. Makefile target

Add a `test-scripts` phony target to `Makefile` that runs:

```make
test-scripts: ## Run BATS tests for shell scripts
	`@command` -v bats >/dev/null || { echo "Install bats-core to run script tests"; exit 1; }
	bats tests/scripts/
```

Add `test-scripts` to the `.PHONY` list and call it from inside the `test` target when
`command -v bats` succeeds (non-fatal: print a notice and skip if BATS is absent).

#### 3. CI workflow — optional proof gate

Add `.github/workflows/proofs.yml`:

```yaml
name: Proofs
on:
  workflow_dispatch:
  schedule:
    - cron: '0 3 * * 1'   # Every Monday at 03:00 UTC
```

The workflow should have a single job `proof` that:
- Runs on `ubuntu-latest`.
- Checks out the repository.
- Installs a stable Rust toolchain with `rustup`.
- Runs `make kani-clone-detector` (which internally calls `scripts/install-kani.sh` then
  `scripts/run-kani.sh clone-detector`).
- Runs `make verus-clone-detector` (which internally calls `scripts/install-verus.sh` then
  `scripts/run-verus.sh clone-detector`).
- Caches both `~/.cache/whitaker/kani` and `~/.cache/whitaker/verus` using
  `actions/cache` keyed on the pinned version strings (`KANI_VERSION`, the Verus version)
  so repeated runs do not re-download.

The workflow must NOT be part of the required status checks — it is an optional verification
step. Document this in `docs/developers-guide.md` under the existing "Proof workflows" section
by adding a short paragraph noting the `Proofs` workflow and how to trigger it manually.

#### 4. Quality gates
- All new `.bats` files must pass `shellcheck` (warn level).
- All new `.sh` additions/edits must pass `shellcheck -S warning`.
- BATS tests must be self-contained: no network calls, no Kani/Verus binaries required.

@leynos leynos merged commit aeec8b8 into main Apr 17, 2026
5 checks passed
@leynos leynos deleted the add-verus-proofs-lsh-new-invariants-5sefjo branch April 17, 2026 11:51
leynos added a commit that referenced this pull request Apr 19, 2026
…194)

* docs(execplans): add comprehensive execplan for clone-detector proofs roadmap 7.2.4-7.2.5

Add a detailed, living ExecPlan document outlining the implementation, constraints, risks, progress stages, and acceptance criteria for roadmap items 7.2.4 and 7.2.5. This plan covers establishing reusable proof plumbing for the clone-detector token pipeline, proving constructor invariants for LshConfig::new, integrating Verus and Kani sidecar proof tooling, and augmenting test and documentation coverage. It serves as the authoritative blueprint governing the scoped proof split and workflow before implementation begins.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* feat(proofs): add practical Kani harnesses for clone-detector with explicit bounds and cfg handling

- Introduce a minimal Kani smoke harness for LshConfig with symbolic but bounded inputs.
- Follow Chutoro patterns: keep harnesses close to code, guard with #[cfg(kani)], and document usage.
- Update unexpected_cfgs allowlist for cfg(kani) to prevent lint warnings.
- Extend Makefile and scripts to run practical Kani targets, leaving room for future exhaustive checks.
- Add detailed docs on harness design, unwinding limits, and state-explosion mitigation strategies.

This supports roadmap item 7.2.4 by providing a practical local smoke gate for Kani proofs without yet implementing heavier exhaustive runs planned for later roadmap items.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* feat(clone-detector): add Verus proof and Kani harness for LshConfig::new

- Added a Verus proof module `verus/clone_detector_lsh_config.rs` that semantically verifies the contract of `LshConfig::new` ensuring positive bands and rows with product equal to MINHASH_SIZE.
- Introduced a Kani harness in `crates/whitaker_clones_core/src/index/kani.rs` for bounded symbolic testing of `LshConfig::new` under `cfg(kani)`.
- Extended rspec-based test scenarios and unit tests to cover zero rows, invalid non-zero products, and overflow cases.
- Added supporting Makefile targets and scripts (`make kani-clone-detector`, `make verus-clone-detector`, install/run scripts) for sidecar proof workflows.
- Updated documentation and roadmap to reflect implementation progress on clone-detector proof workflows and LshConfig invariants.
- Improved shared tooling in `scripts/install-verus.sh` to strip ANSI escape codes for stable Verus installation.

This change establishes a formal verification foundation for clone-detector LSH configuration, enabling future proofs and bounded runtime checks.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* feat(clone-detector): add specific Verus and Kani targets for faster iteration

- Added `make verus-clone-detector` and `make kani-clone-detector` targets
  for targeted proof runs without regressing existing workflow.
- Refactored `scripts/run-kani.sh` to run clone-detector harnesses in a loop,
  improving maintainability.
- Updated `scripts/run-verus.sh` to read proof files efficiently with `mapfile`.
- Fixed shared Verus installer to strip ANSI escape codes from toolchain hints,
  ensuring consistent fallback handling across targets.
- Adjusted documentation in execplans to align with new proof sidecar workflow and
  clarify architecture decisions.

These changes enhance developer efficiency by supporting focused proof
workflow iterations and improving tooling robustness.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* docs(proof-workflows): add documentation for proof workflows and tooling scripts

- Document new Verus and Kani proof make targets and usage
- Explain repository-managed proof tooling and scripts
- Provide instructions for Kani installation and execution
- Include example commands for running proofs
- Fix minor typo in execplans markdown regarding Cargo

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* feat(kaniproofs): add overflow-specific Kani harness and tighten Verus LshConfig proofs

- Added `verify_lsh_config_new_overflow_product` Kani harness to drive overflow path in LshConfig::new.
- Updated Kani harness set to include the overflow-specific proof.
- Extended Verus proof to mirror runtime branch order including overflow rejection.
- Documented the distinction between Verus model and runtime implementation in developer guide.
- Clarified that Verus proves an implementation-shaped model, while Kani executes the real constructor code.
- Improved proofs for zero bands, zero rows, exact product acceptance, invalid product rejection, and overflow in Verus.

This enhances formal coverage of LshConfig constructor ensuring overflow paths are verified without trusted assumptions.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* Fix proof follow-ups and installer test hooks

Apply the remaining proof-workflow review fixes by trimming parsed LSH\ninputs, updating proof documentation wording and status, and\nreplacing Bash-4-only mapfile usage in the Verus wrapper.\n\nAlso make the installer bin tests deterministic again by routing\ntheir Dylint dependency checks through injected test-only install\noptions instead of the live repository-first installer path. This\nkeeps the validated worktree stable before rebasing onto main.

* style(scripts): remove extraneous closing brace in run-kani.sh

The script scripts/run-kani.sh had an unnecessary closing brace '}' removed to fix shell syntax correctness.

Additionally, a minor punctuation fix was applied in docs/whitaker-clone-detector-design.md, and the Makefile's .PHONY declaration was cleaned up by removing duplicate 'kani' target.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

---------

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant