feat(web-ui): per-gate evidence display and run history (#567) by frankbria · Pull Request #575 · frankbria/codeframe

frankbria · 2026-04-09T18:14:42Z

Summary

Closes #567

Backend: proof_runs table added to SQLite ledger; save_run, list_runs, and get_run_evidence functions populate and query it. runner.py now records start/end time, duration, and pass/fail for each run.
Two new API endpoints: GET /api/v2/proof/runs (list last N runs) and GET /api/v2/proof/runs/{run_id}/evidence (per-gate evidence with artifact file content).
GateEvidencePanel: expandable per-gate row showing raw artifact text in a scrollable <pre> block; truncates at 200 lines with "Show full output" toggle.
RunHistoryPanel: "Recent Runs" table showing last 5 runs with timestamp, pass/fail badge, duration, triggered-by; clicking a row sets the selected run.
Both new components are wired into the PROOF9 page (RunHistoryPanel) and the requirement detail page (GateEvidencePanel for latest run evidence).

Test plan

uv run pytest tests/ui/test_proof_v2.py — 47 tests, all pass (14 new)
cd web-ui && npm test — 703 tests, all pass (16 new across 2 new files)
cd web-ui && npm run build — clean build, no TypeScript errors
uv run ruff check — no lint issues

Summary by CodeRabbit

New Features
- Proof run history view: recent executions with timestamps, status, duration, trigger source, and pagination/limit
- Run details view: per-gate evidence with captured artifact text (truncation + "Show full output")
- UI components: collapsible Gate Evidence panel and Run History panel; select runs to load evidence
API
- Added read endpoints for listing runs and fetching per-run evidence
Tests
- Added integration and UI tests covering endpoints and new components
Docs
- Updated roadmap and architecture docs to reflect shipped UI and API features

) - Backend: add proof_runs table to ledger (save_run, list_runs, get_run_evidence) - Backend: populate proof_runs on every run in runner.py - Backend: two new endpoints — GET /api/v2/proof/runs and GET /api/v2/proof/runs/{id}/evidence - Frontend: ProofRunSummary, ProofEvidenceWithContent, ProofRunDetail types - Frontend: GateEvidencePanel — expandable per-gate artifact text, 200-line truncation with toggle - Frontend: RunHistoryPanel — last 5 runs table with pass/fail badge, click to select - Frontend: wire RunHistoryPanel into /proof page, GateEvidencePanel into /proof/[req_id] page - Tests: 14 new backend tests (47 total, all pass); 2 new frontend test files (703 total, all pass)

coderabbitai · 2026-04-09T18:15:07Z

Walkthrough

Adds persistent run records and per-run evidence viewing: new ProofRun model, proof_runs table and ledger APIs; runner persists run metadata (timestamps, duration, outcome); new API endpoints to list runs and fetch run evidence; UI components/pages to show recent runs and expandable per-gate artifact text.

Changes

Cohort / File(s)	Summary
Backend models & ledger `codeframe/core/proof/models.py`, `codeframe/core/proof/ledger.py`	Add `ProofRun` dataclass; create `proof_runs` table; adjust lazy-init to include `proof_runs`; add `save_run`, `get_run`, `list_runs`, `get_run_evidence`.
Backend runner `codeframe/core/proof/runner.py`	Capture timezone-aware `started_at`/`completed_at`, compute `duration_ms`, compute `overall_passed`, and persist run via `ledger.save_run`.
API router & tests `codeframe/ui/routers/proof_v2.py`, `tests/ui/test_proof_v2.py`	Add `GET /api/v2/proof/runs` and `GET /api/v2/proof/runs/{run_id}/evidence`, new response models and artifact-text helper; add integration tests for listing runs and run-evidence responses.
Frontend types & client `web-ui/src/types/index.ts`, `web-ui/src/lib/api.ts`	Add `ProofRunSummary`, `ProofRunDetail`, `ProofEvidenceWithContent` types; add `proofApi.listRuns()` and `proofApi.getRunDetail()`.
Frontend components & exports `web-ui/src/components/proof/GateEvidencePanel.tsx`, `web-ui/src/components/proof/RunHistoryPanel.tsx`, `web-ui/src/components/proof/index.ts`	Add `GateEvidencePanel` (expandable per-gate artifact text with 200-line truncation) and `RunHistoryPanel` (recent runs list, selection/highlight); re-export both.
Frontend pages & wiring `web-ui/src/app/proof/page.tsx`, `web-ui/src/app/proof/[req_id]/page.tsx`	Integrate `RunHistoryPanel`, add selected-run state and SWR fetch for run detail; fetch latest-run evidence for per-req display and render `GateEvidencePanel`.
Frontend tests & mocks `web-ui/src/__tests__/components/proof/*.test.tsx`, `web-ui/src/__tests__/components/proof/ProofPage.test.tsx`	Add tests for `GateEvidencePanel`, `RunHistoryPanel`; extend Jest mocks to include added proof components.
Docs & roadmap `CLAUDE.md`, `docs/PHASE_2_CLI_API_MAPPING.md`, `docs/PHASE_3_UI_ARCHITECTURE.md`, `docs/PRODUCT_ROADMAP.md`	Document new PROOF9 commands, UI views, and mark Phase 3.5B items as shipped / update roadmap and architecture notes.

Sequence Diagram

sequenceDiagram
    participant User
    participant Client
    participant API
    participant Runner
    participant Ledger
    participant DB

    User->>Client: Trigger run (POST /api/v2/proof/run)
    Client->>API: POST /api/v2/proof/run
    API->>Runner: start run
    Runner->>Runner: execute gates (capture started_at)
    Runner->>Runner: compute completed_at, duration_ms, overall_passed
    Runner->>Ledger: save_run(workspace, ProofRun)
    Ledger->>DB: INSERT/REPLACE proof_runs
    DB-->>Ledger: OK
    Ledger-->>Runner: OK
    Runner-->>API: run complete
    API-->>Client: run-complete response

    User->>Client: Request run history
    Client->>API: GET /api/v2/proof/runs?limit=5
    API->>Ledger: list_runs(workspace, limit)
    Ledger->>DB: SELECT ... ORDER BY started_at DESC
    DB-->>Ledger: runs
    Ledger-->>API: runs
    API-->>Client: runs

    User->>Client: Select past run
    Client->>API: GET /api/v2/proof/runs/{run_id}/evidence
    API->>Ledger: get_run_evidence(workspace, run_id)
    Ledger->>DB: SELECT from proof_evidence WHERE run_id, workspace_id
    DB-->>Ledger: evidence rows
    Ledger-->>API: evidence
    API-->>Client: run detail (with artifact_text)
    Client->>UI: render GateEvidencePanel

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(proof): PROOF9 quality memory system (#422) #447: Prior proof subsystem changes — directly related to adding run persistence and runner integration.
feat(api): PROOF9 REST API router — /api/v2/proof endpoints #459: Adds earlier proof_v2 router endpoints that this PR extends with run-list and run-evidence handlers.
feat(web-ui): Run Gates button + live gate progress for PROOF9 (#566) #574: Introduced in-memory run cache and related run-status endpoints used here as fallback for missing ledger metadata.

Poem

🐰 I hop, I stamp each run with care,

Five bright histories kept in my lair,
Gates unfold logs in collapsible rows,
Truncated lines peek where curiosity goes,
Click to reveal — the full output glows!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat(web-ui): per-gate evidence display and run history (`#567`)' directly and clearly summarizes the main changes, matching the core objectives of adding evidence display and run history features.
Linked Issues check	✅ Passed	All requirements from `#567` are met: per-gate evidence display with 200-line truncation implemented in GateEvidencePanel; run history panel showing last 5 runs with pass/fail badges and timestamps implemented in RunHistoryPanel; backend endpoints GET /api/v2/proof/runs and GET /api/v2/proof/runs/{run_id}/evidence implemented with proper database persistence; integration into PROOF9 page completed.
Out of Scope Changes check	✅ Passed	All changes directly support the `#567` requirements: backend ledger APIs for run persistence, ProofRun model, runner modifications for tracking, frontend components for evidence and history display, integration into pages, and documentation updates. No extraneous changes detected outside the stated objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/567-proof-evidence-run-history

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-04-09T18:17:49Z

Review: feat(web-ui): per-gate evidence display and run history (#567)

This is a solid Phase 3.5B delivery. The backend/frontend split is clean, test coverage is thorough (14 new backend + 16 new frontend tests), and the SWR-based data fetching integrates well with the existing proof page patterns. A few issues worth addressing before merge:

Security — Path traversal in `_read_artifact_text` (blocker)

proof_v2.py:

def _read_artifact_text(artifact_path: str) -> Optional[str]:
    from pathlib import Path
    try:
        p = Path(artifact_path)
        if p.exists():
            return p.read_text(errors="replace")
        return None
    except Exception:
        return None

The artifact_path comes from the database (not directly from user input), but an attacker with write access to the evidence table could store a path like /etc/passwd or any sensitive file. At minimum, validate the path is inside the workspace directory before reading:

def _read_artifact_text(artifact_path: str, workspace_root: Path) -> Optional[str]:
    try:
        p = Path(artifact_path).resolve()
        if not str(p).startswith(str(workspace_root.resolve())):
            return None
        if p.exists():
            return p.read_text(errors="replace")
        return None
    except Exception:
        return None

Also: no size cap. A 100MB artifact file will be read entirely into memory and sent to the browser. Consider capping at e.g. 500KB.

Missing `get_run` ledger function

get_run_evidence_endpoint does this to look up a single run:

runs = list_runs(workspace, limit=100)
run = next((r for r in runs if r.run_id == run_id), None)

This is an O(N) scan capped at 100. A run outside that window returns 404 even if it exists. Add a get_run(workspace, run_id) -> Optional[ProofRun] to ledger.py with a direct WHERE run_id = ? AND workspace_id = ? query.

`workspacePath` type mismatch in `RunHistoryPanel`

The prop is typed as string:

interface RunHistoryPanelProps {
  workspacePath: string;

But the caller passes string | null:

const [workspacePath, setWorkspacePath] = useState<string | null>(null);
// ...
<RunHistoryPanel workspacePath={workspacePath} .../>

The SWR null-key guard handles the runtime case, but the prop type should be string | null to match reality. The build probably passes only because TypeScript isn't strict enough here, or the null check is elsewhere in the parent's render path.

Inline imports

from pathlib import Path is inside _read_artifact_text's function body. The cache-fallback block also has:

import time as _time
from datetime import datetime as _dt, timezone as _tz

Move these to the module-level imports. The aliased names (_dt, _tz) suggest they were added defensively to avoid shadowing — not needed at module level since datetime and timezone aren't shadowed elsewhere in the file.

`triggered_by` always hardcoded to `"human"`

runner.py always passes triggered_by="human". The roadmap mentions auto-triggered runs (4B: post-merge glitch loop). Consider making run_proof() accept a triggered_by: str = "human" parameter now so the API and batch runner can set it correctly when that time comes. Low cost to do it right upfront.

Minor: `_run_cache` fallback adds complexity for an edge case that shouldn't occur

save_run is called synchronously at the end of run_proof() before returning. By the time the client receives the run_id from POST /api/v2/proof/run and makes a follow-up GET /runs/{run_id}/evidence, the record is already in the DB. The cache fallback path:

Returns started_at == completed_at (misleading — duration would be 0)
Has artifact_path="", artifact_checksum="", timestamp="" for all evidence items

If this fallback was needed for a real race condition, the empty strings would cause downstream issues. If it's not needed, removing it simplifies the endpoint considerably. Worth clarifying the intent.

Nit: `test_list_runs_empty_initially` may be fragile

This test asserts the list is empty before any run. If the test DB is shared across test methods within the class (depends on fixture scope), a prior test that triggered a run would cause this to fail. Worth verifying the fixture gives a fresh DB per test, or using @pytest.mark.usefixtures("clean_workspace") explicitly.

What's working well

_ensure_tables logic fix is correct — always closes the connection, checks both tables
ProofRunDetailResponse inheritance from ProofRunSummaryResponse avoids duplication
Frontend components handle all states cleanly (loading, error, empty, data)
Keyboard navigation on run rows (role="button", tabIndex, onKeyDown) is solid
truncateLines with "Show full output" toggle is a good UX pattern
Test coverage hits all the important paths including empty state, shape validation, 404 for unknown run

The security issue with _read_artifact_text is the only hard blocker. The missing get_run function is worth fixing before merge to avoid the 100-run window limitation.

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (1)

codeframe/core/proof/ledger.py (1)

74-85: Index the new run-history tables.

The new UI calls into list_runs() and get_run_evidence() for interactive views, but this migration only creates the tables. Without supporting indexes, SQLite will eventually full-scan and sort proof_runs / proof_evidence on every history or evidence request. Adding indexes on the lookup keys now will keep these panels responsive as data grows.

Suggested migration additions

     cursor.execute("""
         CREATE TABLE IF NOT EXISTS proof_runs (
             run_id TEXT NOT NULL,
             workspace_id TEXT NOT NULL,
             started_at TEXT NOT NULL,
             completed_at TEXT,
             triggered_by TEXT NOT NULL DEFAULT 'human',
             overall_passed INTEGER NOT NULL DEFAULT 0,
             duration_ms INTEGER,
             PRIMARY KEY (run_id, workspace_id)
         )
     """)
+
+    cursor.execute(
+        "CREATE INDEX IF NOT EXISTS idx_proof_runs_workspace_started_at "
+        "ON proof_runs (workspace_id, started_at DESC)"
+    )
+    cursor.execute(
+        "CREATE INDEX IF NOT EXISTS idx_proof_evidence_workspace_run "
+        "ON proof_evidence (workspace_id, run_id)"
+    )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/proof/ledger.py` around lines 74 - 85, The migration creates
proof_runs but lacks indexes, so update the migration SQL to add CREATE INDEX IF
NOT EXISTS statements for the lookup/sort keys used by list_runs() and
get_run_evidence(): add indexes on proof_runs(workspace_id),
proof_runs(workspace_id, started_at) (or started_at DESC for sort), and
proof_runs(run_id, workspace_id) as well as matching indexes on proof_evidence
(e.g., proof_evidence(workspace_id) and proof_evidence(run_id, workspace_id)).
Put these CREATE INDEX IF NOT EXISTS statements right after the table creations
so queries in list_runs() and get_run_evidence() avoid full-table scans as data
grows.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codeframe/core/proof/ledger.py`:
- Around line 385-410: Add a precise run lookup helper get_run(workspace,
run_id) that mirrors list_runs' behavior but queries proof_runs WHERE run_id = ?
AND workspace_id = ? using get_db_connection and _ensure_tables; fetchone, close
the connection, and if a row exists return a ProofRun constructed the same way
as in list_runs (parsing started_at/completed_at with datetime.fromisoformat,
casting overall_passed to bool, preserving duration_ms and triggered_by),
otherwise return None to allow exact historical lookups without paging.

In `@codeframe/core/proof/runner.py`:
- Around line 135-137: overall_passed currently uses "bool(results) and
all(...)" which treats runs with no checks as failed; change the logic so
empty-result runs are considered passed by computing overall_passed = all(passed
for gate_results in results.values() for _, passed in gate_results) if results
else True, updating the variable assignment in runner.py where overall_passed is
defined so it treats an empty results dict as True.
- Around line 72-73: The early return when there are no open requirements skips
persisting the run, so ensure ledger.save_run(...) is invoked for user-triggered
runs before that return; move or add a call to ledger.save_run using the same
run metadata (e.g., started_at, finished_at, run_id/identifier, user trigger
flag and outcome) immediately prior to the "no open requirements" return in the
function that defines started_at, so every execution (including the no-run case)
is recorded in history.

In `@codeframe/ui/routers/proof_v2.py`:
- Around line 526-535: The router currently reads entire artifact files via
_read_artifact_text and serializes full content in get_run_evidence_endpoint,
which can blow memory for large logs; move the artifact resolution and preview
logic into a core function (e.g., create a core API like
core.artifacts.get_artifact_preview(path, max_lines=200) and
core.artifacts.get_artifact_full(path)) and change the router to call those thin
core functions so the router no longer reads files directly; ensure the core
function returns a bounded preview (e.g., first N lines or bytes) by default and
provide a separate router endpoint (e.g., /evidence/{id}/artifact/full) that
calls the core get_artifact_full to fetch the complete content on explicit user
action.

In `@web-ui/src/app/proof/`[req_id]/page.tsx:
- Around line 112-115: latestEvidence is currently set to the entire run-wide
latestRunDetail.evidence, which can include gates from other requirements;
update the useMemo for latestEvidence to filter latestRunDetail.evidence by the
route param req_id so only evidence for this requirement is shown (e.g., change
the computation in the useMemo that references latestRunDetail and
latestRunDetail?.evidence to filter items where the evidence's requirement
identifier matches req_id — check properties like requirementId, requirement_id
or req_id on the evidence object and use the correct one).

In `@web-ui/src/app/proof/page.tsx`:
- Around line 106-107: The main proof view isn’t using selectedRunId — wire
selectedRunId (and setSelectedRunId/RunHistoryPanel) into the main gate
rendering flow: add a useEffect that listens to selectedRunId and when non-null
calls the existing run-results loader (or create a fetchLoadRunResults function)
to retrieve historical gate results, store them in the state the gate renderer
uses (e.g., historicalRunResults or currentDisplayedRun), and have the
gate/gate-progress components render that state instead of the live run; also
support clearing selectedRunId to switch back to live run rendering.

---

Nitpick comments:
In `@codeframe/core/proof/ledger.py`:
- Around line 74-85: The migration creates proof_runs but lacks indexes, so
update the migration SQL to add CREATE INDEX IF NOT EXISTS statements for the
lookup/sort keys used by list_runs() and get_run_evidence(): add indexes on
proof_runs(workspace_id), proof_runs(workspace_id, started_at) (or started_at
DESC for sort), and proof_runs(run_id, workspace_id) as well as matching indexes
on proof_evidence (e.g., proof_evidence(workspace_id) and proof_evidence(run_id,
workspace_id)). Put these CREATE INDEX IF NOT EXISTS statements right after the
table creations so queries in list_runs() and get_run_evidence() avoid
full-table scans as data grows.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eca6840a-2ed3-4e00-b8b0-c5ff559aa49d

📥 Commits

Reviewing files that changed from the base of the PR and between c4d9697 and 695d3ab.

📒 Files selected for processing (15)

codeframe/core/proof/ledger.py
codeframe/core/proof/models.py
codeframe/core/proof/runner.py
codeframe/ui/routers/proof_v2.py
tests/ui/test_proof_v2.py
web-ui/src/__tests__/components/proof/GateEvidencePanel.test.tsx
web-ui/src/__tests__/components/proof/ProofPage.test.tsx
web-ui/src/__tests__/components/proof/RunHistoryPanel.test.tsx
web-ui/src/app/proof/[req_id]/page.tsx
web-ui/src/app/proof/page.tsx
web-ui/src/components/proof/GateEvidencePanel.tsx
web-ui/src/components/proof/RunHistoryPanel.tsx
web-ui/src/components/proof/index.ts
web-ui/src/lib/api.ts
web-ui/src/types/index.ts

codeframe/core/proof/ledger.py

codeframe/core/proof/runner.py

coderabbitai · 2026-04-09T18:22:46Z

codeframe/ui/routers/proof_v2.py

+def _read_artifact_text(artifact_path: str) -> Optional[str]:
+    """Read artifact file content, returning None if the file is missing."""
+    from pathlib import Path
+    try:
+        p = Path(artifact_path)
+        if p.exists():
+            return p.read_text(errors="replace")
+        return None
+    except Exception:
+        return None


⚠️ Potential issue | 🟠 Major

Avoid inlining full artifact files into this response.

_read_artifact_text() reads every artifact completely, and get_run_evidence_endpoint() serializes that whole payload for each evidence row. Large pytest/coverage logs will make this endpoint slow and memory-heavy even though the UI initially shows only a 200-line preview. Move artifact-content resolution into core and return a bounded preview by default, with an explicit follow-up fetch for “Show full output”.

As per coding guidelines, "FastAPI routers in codeframe/ui/routers/ must be thin adapters over core functionality; logic must live in core modules, not routers".

Also applies to: 590-600

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@codeframe/ui/routers/proof_v2.py` around lines 526 - 535, The router currently reads entire artifact files via _read_artifact_text and serializes full content in get_run_evidence_endpoint, which can blow memory for large logs; move the artifact resolution and preview logic into a core function (e.g., create a core API like core.artifacts.get_artifact_preview(path, max_lines=200) and core.artifacts.get_artifact_full(path)) and change the router to call those thin core functions so the router no longer reads files directly; ensure the core function returns a bounded preview (e.g., first N lines or bytes) by default and provide a separate router endpoint (e.g., /evidence/{id}/artifact/full) that calls the core get_artifact_full to fetch the complete content on explicit user action.

web-ui/src/app/proof/[req_id]/page.tsx

web-ui/src/app/proof/page.tsx

- ledger.py: add get_run() for O(1) run lookup by run_id (avoids scan) - runner.py: persist run record even when no open requirements exist - runner.py: fix overall_passed logic (empty runs → True, not False) - proof_v2.py: truncate artifact text to 200 lines server-side - proof_v2.py: use get_run() instead of list_runs() scan in evidence endpoint - [req_id]/page.tsx: filter latestEvidence by req_id (was returning all gates) - proof/page.tsx: fetch and render selected run evidence via GateEvidencePanel

claude · 2026-04-09T18:52:50Z

Follow-up Review

Two items from my prior review are resolved:

get_run ledger function added with a direct WHERE run_id = ? AND workspace_id = ? query. Good.
Line truncation _read_artifact_text now caps at 200 lines. Good.

The main blocker and a couple of smaller issues remain.

Security: Path traversal still unaddressed (blocker)

_read_artifact_text still has no workspace root check. An artifact path stored in the DB (e.g. /etc/shadow) can be read and returned to the browser:

def _read_artifact_text(artifact_path: str, workspace: Workspace, max_lines: int = _ARTIFACT_LINE_LIMIT) -> Optional[str]:
    try:
        p = Path(artifact_path).resolve()
        root = Path(workspace.repo_path).resolve()
        if not str(p).startswith(str(root)):
            return None
        ...

Pass workspace into the function and resolve both paths before comparing. The callers in the endpoint already have the workspace object available.

Also: p.read_text() still loads the full file before slicing. For a large artifact the line cap does not help memory usage. Reading iteratively with p.open() and stopping after max_lines is safer.

Inline imports

These remain inside function bodies:

from pathlib import Path inside _read_artifact_text
import time as _time and from datetime import datetime as _dt, timezone as _tz inside the cache fallback block

Move to module-level. The _dt/_tz aliases are not needed since datetime/timezone are not shadowed elsewhere in the file.

workspacePath type mismatch in RunHistoryPanel

The prop interface declares workspacePath: string but the parent passes string | null. The SWR null-key guard handles the runtime case, but the declared type should be string | null so TypeScript catches future misuse at the call site.

Security fix is the only hard blocker. The other two are quick cleanups worth doing before merge.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codeframe/core/proof/runner.py`:
- Around line 83-93: The run history is misattributed because triggered_by is
hardcoded to "human"; update the run_proof function signature to accept a
triggered_by parameter (default "human") and use that parameter when
constructing both ProofRun instances (the one created in run_proof and the
second one later around the other ProofRun(...) construction), ensuring the
passed-in value is threaded into ledger.save_run calls; also update all call
sites of run_proof to pass an appropriate triggered_by value where automation
may invoke it.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 39d37b1f-49b1-4180-8517-508c408c735f

📥 Commits

Reviewing files that changed from the base of the PR and between 695d3ab and 6f76a30.

📒 Files selected for processing (5)

codeframe/core/proof/ledger.py
codeframe/core/proof/runner.py
codeframe/ui/routers/proof_v2.py
web-ui/src/app/proof/[req_id]/page.tsx
web-ui/src/app/proof/page.tsx

✅ Files skipped from review due to trivial changes (1)

codeframe/ui/routers/proof_v2.py

🚧 Files skipped from review as they are similar to previous changes (2)

web-ui/src/app/proof/page.tsx
codeframe/core/proof/ledger.py

coderabbitai · 2026-04-09T18:54:55Z

codeframe/core/proof/runner.py

+        ledger.save_run(
+            workspace,
+            ProofRun(
+                run_id=run_id,
+                workspace_id=workspace.id,
+                started_at=started_at,
+                completed_at=completed_at,
+                triggered_by="human",
+                overall_passed=True,
+                duration_ms=int((completed_at - started_at).total_seconds() * 1000),
+            ),


⚠️ Potential issue | 🟠 Major

triggered_by is hardcoded to "human" in both persisted run paths.

If run_proof is ever invoked by automation, run history will be misattributed. Consider threading a triggered_by parameter through run_proof and using it in both ProofRun(...) constructions (Line 90 and Line 157).

Suggested fix

def run_proof( workspace: Workspace, *, full: bool = False, gate_filter: Optional[Gate] = None, run_id: Optional[str] = None, + triggered_by: str = "human", ) -> dict[str, list[tuple[Gate, bool]]]: @@ ProofRun( run_id=run_id, workspace_id=workspace.id, started_at=started_at, completed_at=completed_at, - triggered_by="human", + triggered_by=triggered_by, overall_passed=True, duration_ms=int((completed_at - started_at).total_seconds() * 1000), ), @@ ProofRun( run_id=run_id, workspace_id=workspace.id, started_at=started_at, completed_at=completed_at, - triggered_by="human", + triggered_by=triggered_by, overall_passed=overall_passed, duration_ms=duration_ms, ),

Also applies to: 150-160

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@codeframe/core/proof/runner.py` around lines 83 - 93, The run history is misattributed because triggered_by is hardcoded to "human"; update the run_proof function signature to accept a triggered_by parameter (default "human") and use that parameter when constructing both ProofRun instances (the one created in run_proof and the second one later around the other ProofRun(...) construction), ensuring the passed-in value is threaded into ledger.save_run calls; also update all call sites of run_proof to pass an appropriate triggered_by value where automation may invoke it.

…story

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/PHASE_3_UI_ARCHITECTURE.md`:
- Around line 362-364: Update the docs so GateEvidencePanel is described as
rendering per-gate evidence rows (pass/fail + artifact text) rather than hosting
run-state progress; remove or relocate mention of GateProgressRow (pending →
running → passed/failed) from GateEvidencePanel and assign that responsibility
to the run-progress components (e.g., GateRunPanel and GateRunBanner), ensuring
the doc now states GateRunPanel/GateRunBanner own progress-state UI while
GateEvidencePanel owns evidence/artifact display.
- Line 353: The fenced code block that documents the component tree (starting
with "ProofPage (/proof)" and containing nodes like ProofHeader and
RunGatesButton) is missing a language tag and triggers markdownlint MD040;
update the opening fence to include a language (e.g., change "```" to "```text")
so the tree block is recognized as plain text, leaving the block content and
closing "```" unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4ac59335-64db-4149-af0d-961a02b180d4

📥 Commits

Reviewing files that changed from the base of the PR and between 6f76a30 and 70accfb.

📒 Files selected for processing (4)

CLAUDE.md
docs/PHASE_2_CLI_API_MAPPING.md
docs/PHASE_3_UI_ARCHITECTURE.md
docs/PRODUCT_ROADMAP.md

✅ Files skipped from review due to trivial changes (3)

docs/PRODUCT_ROADMAP.md
docs/PHASE_2_CLI_API_MAPPING.md
CLAUDE.md

coderabbitai · 2026-04-09T19:05:47Z

docs/PHASE_3_UI_ARCHITECTURE.md

+- Run history panel with outcome and duration
+
+**Component Hierarchy:**
+```


⚠️ Potential issue | 🟡 Minor

Add a language tag to the fenced block to satisfy markdownlint.

Line 353 opens a fenced code block without a language, triggering MD040. Use text (or another appropriate language) for the tree block.

Proposed fix

-``` +```text ProofPage (/proof) ├── ProofHeader │ ├── RunGatesButton → POST /api/v2/proof/run ... -``` +```

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 353-353: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/PHASE_3_UI_ARCHITECTURE.md` at line 353, The fenced code block that documents the component tree (starting with "ProofPage (/proof)" and containing nodes like ProofHeader and RunGatesButton) is missing a language tag and triggers markdownlint MD040; update the opening fence to include a language (e.g., change "```" to "```text") so the tree block is recognized as plain text, leaving the block content and closing "```" unchanged.

coderabbitai · 2026-04-09T19:05:47Z

docs/PHASE_3_UI_ARCHITECTURE.md

+├── GateEvidencePanel          ← new (Phase 3.5B)
+│   ├── GateProgressRow[] (pending → running → passed/failed)
+│   └── EvidenceArtifactDisplay (artifact text, scrollable)


⚠️ Potential issue | 🟡 Minor

Align GateEvidencePanel docs with actual component responsibilities.

Line 362–364 describes GateEvidencePanel as hosting GateProgressRow[] with pending → running → passed/failed, but current implementation of web-ui/src/components/proof/GateEvidencePanel.tsx renders per-gate evidence rows (pass/fail + artifact text) and not running-state progress. Please move progress-row ownership to the run-progress components (e.g., GateRunPanel/GateRunBanner) in this doc to avoid architectural drift.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/PHASE_3_UI_ARCHITECTURE.md` around lines 362 - 364, Update the docs so GateEvidencePanel is described as rendering per-gate evidence rows (pass/fail + artifact text) rather than hosting run-state progress; remove or relocate mention of GateProgressRow (pending → running → passed/failed) from GateEvidencePanel and assign that responsibility to the run-progress components (e.g., GateRunPanel and GateRunBanner), ensuring the doc now states GateRunPanel/GateRunBanner own progress-state UI while GateEvidencePanel owns evidence/artifact display.

claude · 2026-04-09T19:06:47Z

Review: feat(web-ui): per-gate evidence display and run history (#567)

This closes Phase 3.5B cleanly. The full stack is coherent — DB persistence, two new API endpoints, two new React components, and solid test coverage (14 new backend tests, 16 new frontend tests). The component design for GateEvidencePanel and RunHistoryPanel is well-scoped and the SWR integration is consistent with the rest of the web UI.

I've reviewed the latest commit against both the code and CodeRabbit's prior passes. Here's what I think is still worth addressing.

Security — path traversal in `_read_artifact_text`

def _read_artifact_text(artifact_path: str, ...) -> Optional[str]:
    p = Path(artifact_path)
    if not p.exists():
        return None
    lines = p.read_text(errors="replace").splitlines(keepends=True)
    return "".join(lines[:max_lines])

artifact_path is read from the SQLite ledger and served over HTTP. If the database were ever populated with a crafted path (e.g., /etc/passwd or ~/.ssh/id_rsa), this endpoint would serve that content to anyone with API access. The fix is a single containment check:

workspace_root = Path(workspace.repo_path).resolve()
p = Path(artifact_path).resolve()
if not p.is_relative_to(workspace_root):
    return None

CodeRabbit also noted this helper belongs in core rather than the router layer — both concerns are worth addressing together.

Still open from CodeRabbit's second review — `triggered_by` hardcoded

runner.py still passes triggered_by="human" unconditionally, including when the run is started via the API (which can be automated). The fix is a parameter with a sensible default:

def run_proof(..., triggered_by: str = "human") -> dict:
    ...
    ProofRun(..., triggered_by=triggered_by, ...)

Call sites that don't care get the default; the API router can pass "api" or surface it as a query param if needed. This is a low-effort change that makes the run history accurate.

Still open from CodeRabbit's first review — missing indexes

The proof_runs table has a composite PK on (run_id, workspace_id) but list_runs orders by started_at DESC and filters on workspace_id — a full-scan + sort on every history panel load. Same issue for get_run_evidence, which filters proof_evidence by (run_id, workspace_id). These two indexes cover both queries:

CREATE INDEX IF NOT EXISTS idx_proof_runs_ws_started
    ON proof_runs (workspace_id, started_at DESC);

CREATE INDEX IF NOT EXISTS idx_proof_evidence_ws_run
    ON proof_evidence (workspace_id, run_id);

Inline imports inside the cache fallback path

In get_run_evidence_endpoint, the DB-miss fallback contains:

import time as _time
ts = cached.get("_ts", _time.time())
from datetime import datetime as _dt, timezone as _tz
ts_str = _dt.fromtimestamp(ts, tz=_tz.utc).isoformat()

datetime and timezone are already importable from module-level imports — they don't need to be re-imported inline. time can similarly be moved to a top-level import. These deferred imports add noise and are flagged by some linters.

Test assertion gap in ordering test

def test_list_runs_ordered_newest_first(self, test_client):
    ...
    runs = test_client.get("/api/v2/proof/runs").json()
    if len(runs) >= 2:           # silently no-ops if fixture only produces 1 run
        assert runs[0]["started_at"] >= runs[1]["started_at"]

The conditional if makes the ordering assertion a no-op when the fixture produces only one run. Use assert len(runs) >= 2 instead so the check is guaranteed to execute.

Minor: `GateEvidenceRow` key includes list index

<GateEvidenceRow key={`${ev.run_id}:${ev.gate}:${i}`} ev={ev} />

Using i (position) means React will re-mount rows on any reorder. ev.req_id is already available and makes the key stable: ${ev.run_id}:${ev.req_id}:${ev.gate}.

Minor: JSX indentation in `proof/page.tsx`

RunHistoryPanel is placed one indent level shallower than its sibling elements. Not a functional issue, but the formatter will catch it.

Summary: The implementation is solid and Phase 3.5B is complete. The path traversal issue in _read_artifact_text is the only item I would call a blocker — the rest are improvements or cleanup. The triggered_by parameter and the missing indexes are the most valuable follow-ups.

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

docs: sync documentation for Phase 3.5B — evidence display and run hi…

70accfb

…story

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

frankbria merged commit 0c12d51 into main Apr 9, 2026
16 checks passed

Conversation

frankbria commented Apr 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

claude bot commented Apr 9, 2026

Review: feat(web-ui): per-gate evidence display and run history (#567)

Security — Path traversal in _read_artifact_text (blocker)

Missing get_run ledger function

workspacePath type mismatch in RunHistoryPanel

Inline imports

triggered_by always hardcoded to "human"

Minor: _run_cache fallback adds complexity for an edge case that shouldn't occur

Nit: test_list_runs_empty_initially may be fragile

What's working well

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

claude bot commented Apr 9, 2026

Follow-up Review

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Apr 9, 2026

Review: feat(web-ui): per-gate evidence display and run history (#567)

Security — path traversal in _read_artifact_text

Still open from CodeRabbit's second review — triggered_by hardcoded

Still open from CodeRabbit's first review — missing indexes

Inline imports inside the cache fallback path

Test assertion gap in ordering test

Minor: GateEvidenceRow key includes list index

Minor: JSX indentation in proof/page.tsx

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

frankbria commented Apr 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 9, 2026 •

edited

Loading

Security — Path traversal in `_read_artifact_text` (blocker)

Missing `get_run` ledger function

`workspacePath` type mismatch in `RunHistoryPanel`

`triggered_by` always hardcoded to `"human"`

Minor: `_run_cache` fallback adds complexity for an edge case that shouldn't occur

Nit: `test_list_runs_empty_initially` may be fragile

Security — path traversal in `_read_artifact_text`

Still open from CodeRabbit's second review — `triggered_by` hardcoded

Minor: `GateEvidenceRow` key includes list index

Minor: JSX indentation in `proof/page.tsx`