feat(web-ui): per-gate evidence display and run history (#567)#575
feat(web-ui): per-gate evidence display and run history (#567)#575
Conversation
) - Backend: add proof_runs table to ledger (save_run, list_runs, get_run_evidence) - Backend: populate proof_runs on every run in runner.py - Backend: two new endpoints — GET /api/v2/proof/runs and GET /api/v2/proof/runs/{id}/evidence - Frontend: ProofRunSummary, ProofEvidenceWithContent, ProofRunDetail types - Frontend: GateEvidencePanel — expandable per-gate artifact text, 200-line truncation with toggle - Frontend: RunHistoryPanel — last 5 runs table with pass/fail badge, click to select - Frontend: wire RunHistoryPanel into /proof page, GateEvidencePanel into /proof/[req_id] page - Tests: 14 new backend tests (47 total, all pass); 2 new frontend test files (703 total, all pass)
WalkthroughAdds persistent run records and per-run evidence viewing: new ProofRun model, proof_runs table and ledger APIs; runner persists run metadata (timestamps, duration, outcome); new API endpoints to list runs and fetch run evidence; UI components/pages to show recent runs and expandable per-gate artifact text. Changes
Sequence DiagramsequenceDiagram
participant User
participant Client
participant API
participant Runner
participant Ledger
participant DB
User->>Client: Trigger run (POST /api/v2/proof/run)
Client->>API: POST /api/v2/proof/run
API->>Runner: start run
Runner->>Runner: execute gates (capture started_at)
Runner->>Runner: compute completed_at, duration_ms, overall_passed
Runner->>Ledger: save_run(workspace, ProofRun)
Ledger->>DB: INSERT/REPLACE proof_runs
DB-->>Ledger: OK
Ledger-->>Runner: OK
Runner-->>API: run complete
API-->>Client: run-complete response
User->>Client: Request run history
Client->>API: GET /api/v2/proof/runs?limit=5
API->>Ledger: list_runs(workspace, limit)
Ledger->>DB: SELECT ... ORDER BY started_at DESC
DB-->>Ledger: runs
Ledger-->>API: runs
API-->>Client: runs
User->>Client: Select past run
Client->>API: GET /api/v2/proof/runs/{run_id}/evidence
API->>Ledger: get_run_evidence(workspace, run_id)
Ledger->>DB: SELECT from proof_evidence WHERE run_id, workspace_id
DB-->>Ledger: evidence rows
Ledger-->>API: evidence
API-->>Client: run detail (with artifact_text)
Client->>UI: render GateEvidencePanel
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Review: feat(web-ui): per-gate evidence display and run history (#567)This is a solid Phase 3.5B delivery. The backend/frontend split is clean, test coverage is thorough (14 new backend + 16 new frontend tests), and the SWR-based data fetching integrates well with the existing proof page patterns. A few issues worth addressing before merge: Security — Path traversal in
|
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
codeframe/core/proof/ledger.py (1)
74-85: Index the new run-history tables.The new UI calls into
list_runs()andget_run_evidence()for interactive views, but this migration only creates the tables. Without supporting indexes, SQLite will eventually full-scan and sortproof_runs/proof_evidenceon every history or evidence request. Adding indexes on the lookup keys now will keep these panels responsive as data grows.Suggested migration additions
cursor.execute(""" CREATE TABLE IF NOT EXISTS proof_runs ( run_id TEXT NOT NULL, workspace_id TEXT NOT NULL, started_at TEXT NOT NULL, completed_at TEXT, triggered_by TEXT NOT NULL DEFAULT 'human', overall_passed INTEGER NOT NULL DEFAULT 0, duration_ms INTEGER, PRIMARY KEY (run_id, workspace_id) ) """) + + cursor.execute( + "CREATE INDEX IF NOT EXISTS idx_proof_runs_workspace_started_at " + "ON proof_runs (workspace_id, started_at DESC)" + ) + cursor.execute( + "CREATE INDEX IF NOT EXISTS idx_proof_evidence_workspace_run " + "ON proof_evidence (workspace_id, run_id)" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@codeframe/core/proof/ledger.py` around lines 74 - 85, The migration creates proof_runs but lacks indexes, so update the migration SQL to add CREATE INDEX IF NOT EXISTS statements for the lookup/sort keys used by list_runs() and get_run_evidence(): add indexes on proof_runs(workspace_id), proof_runs(workspace_id, started_at) (or started_at DESC for sort), and proof_runs(run_id, workspace_id) as well as matching indexes on proof_evidence (e.g., proof_evidence(workspace_id) and proof_evidence(run_id, workspace_id)). Put these CREATE INDEX IF NOT EXISTS statements right after the table creations so queries in list_runs() and get_run_evidence() avoid full-table scans as data grows.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@codeframe/core/proof/ledger.py`:
- Around line 385-410: Add a precise run lookup helper get_run(workspace,
run_id) that mirrors list_runs' behavior but queries proof_runs WHERE run_id = ?
AND workspace_id = ? using get_db_connection and _ensure_tables; fetchone, close
the connection, and if a row exists return a ProofRun constructed the same way
as in list_runs (parsing started_at/completed_at with datetime.fromisoformat,
casting overall_passed to bool, preserving duration_ms and triggered_by),
otherwise return None to allow exact historical lookups without paging.
In `@codeframe/core/proof/runner.py`:
- Around line 135-137: overall_passed currently uses "bool(results) and
all(...)" which treats runs with no checks as failed; change the logic so
empty-result runs are considered passed by computing overall_passed = all(passed
for gate_results in results.values() for _, passed in gate_results) if results
else True, updating the variable assignment in runner.py where overall_passed is
defined so it treats an empty results dict as True.
- Around line 72-73: The early return when there are no open requirements skips
persisting the run, so ensure ledger.save_run(...) is invoked for user-triggered
runs before that return; move or add a call to ledger.save_run using the same
run metadata (e.g., started_at, finished_at, run_id/identifier, user trigger
flag and outcome) immediately prior to the "no open requirements" return in the
function that defines started_at, so every execution (including the no-run case)
is recorded in history.
In `@codeframe/ui/routers/proof_v2.py`:
- Around line 526-535: The router currently reads entire artifact files via
_read_artifact_text and serializes full content in get_run_evidence_endpoint,
which can blow memory for large logs; move the artifact resolution and preview
logic into a core function (e.g., create a core API like
core.artifacts.get_artifact_preview(path, max_lines=200) and
core.artifacts.get_artifact_full(path)) and change the router to call those thin
core functions so the router no longer reads files directly; ensure the core
function returns a bounded preview (e.g., first N lines or bytes) by default and
provide a separate router endpoint (e.g., /evidence/{id}/artifact/full) that
calls the core get_artifact_full to fetch the complete content on explicit user
action.
In `@web-ui/src/app/proof/`[req_id]/page.tsx:
- Around line 112-115: latestEvidence is currently set to the entire run-wide
latestRunDetail.evidence, which can include gates from other requirements;
update the useMemo for latestEvidence to filter latestRunDetail.evidence by the
route param req_id so only evidence for this requirement is shown (e.g., change
the computation in the useMemo that references latestRunDetail and
latestRunDetail?.evidence to filter items where the evidence's requirement
identifier matches req_id — check properties like requirementId, requirement_id
or req_id on the evidence object and use the correct one).
In `@web-ui/src/app/proof/page.tsx`:
- Around line 106-107: The main proof view isn’t using selectedRunId — wire
selectedRunId (and setSelectedRunId/RunHistoryPanel) into the main gate
rendering flow: add a useEffect that listens to selectedRunId and when non-null
calls the existing run-results loader (or create a fetchLoadRunResults function)
to retrieve historical gate results, store them in the state the gate renderer
uses (e.g., historicalRunResults or currentDisplayedRun), and have the
gate/gate-progress components render that state instead of the live run; also
support clearing selectedRunId to switch back to live run rendering.
---
Nitpick comments:
In `@codeframe/core/proof/ledger.py`:
- Around line 74-85: The migration creates proof_runs but lacks indexes, so
update the migration SQL to add CREATE INDEX IF NOT EXISTS statements for the
lookup/sort keys used by list_runs() and get_run_evidence(): add indexes on
proof_runs(workspace_id), proof_runs(workspace_id, started_at) (or started_at
DESC for sort), and proof_runs(run_id, workspace_id) as well as matching indexes
on proof_evidence (e.g., proof_evidence(workspace_id) and proof_evidence(run_id,
workspace_id)). Put these CREATE INDEX IF NOT EXISTS statements right after the
table creations so queries in list_runs() and get_run_evidence() avoid
full-table scans as data grows.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: eca6840a-2ed3-4e00-b8b0-c5ff559aa49d
📒 Files selected for processing (15)
codeframe/core/proof/ledger.pycodeframe/core/proof/models.pycodeframe/core/proof/runner.pycodeframe/ui/routers/proof_v2.pytests/ui/test_proof_v2.pyweb-ui/src/__tests__/components/proof/GateEvidencePanel.test.tsxweb-ui/src/__tests__/components/proof/ProofPage.test.tsxweb-ui/src/__tests__/components/proof/RunHistoryPanel.test.tsxweb-ui/src/app/proof/[req_id]/page.tsxweb-ui/src/app/proof/page.tsxweb-ui/src/components/proof/GateEvidencePanel.tsxweb-ui/src/components/proof/RunHistoryPanel.tsxweb-ui/src/components/proof/index.tsweb-ui/src/lib/api.tsweb-ui/src/types/index.ts
codeframe/ui/routers/proof_v2.py
Outdated
| def _read_artifact_text(artifact_path: str) -> Optional[str]: | ||
| """Read artifact file content, returning None if the file is missing.""" | ||
| from pathlib import Path | ||
| try: | ||
| p = Path(artifact_path) | ||
| if p.exists(): | ||
| return p.read_text(errors="replace") | ||
| return None | ||
| except Exception: | ||
| return None |
There was a problem hiding this comment.
Avoid inlining full artifact files into this response.
_read_artifact_text() reads every artifact completely, and get_run_evidence_endpoint() serializes that whole payload for each evidence row. Large pytest/coverage logs will make this endpoint slow and memory-heavy even though the UI initially shows only a 200-line preview. Move artifact-content resolution into core and return a bounded preview by default, with an explicit follow-up fetch for “Show full output”.
As per coding guidelines, "FastAPI routers in codeframe/ui/routers/ must be thin adapters over core functionality; logic must live in core modules, not routers".
Also applies to: 590-600
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@codeframe/ui/routers/proof_v2.py` around lines 526 - 535, The router
currently reads entire artifact files via _read_artifact_text and serializes
full content in get_run_evidence_endpoint, which can blow memory for large logs;
move the artifact resolution and preview logic into a core function (e.g.,
create a core API like core.artifacts.get_artifact_preview(path, max_lines=200)
and core.artifacts.get_artifact_full(path)) and change the router to call those
thin core functions so the router no longer reads files directly; ensure the
core function returns a bounded preview (e.g., first N lines or bytes) by
default and provide a separate router endpoint (e.g.,
/evidence/{id}/artifact/full) that calls the core get_artifact_full to fetch the
complete content on explicit user action.
- ledger.py: add get_run() for O(1) run lookup by run_id (avoids scan) - runner.py: persist run record even when no open requirements exist - runner.py: fix overall_passed logic (empty runs → True, not False) - proof_v2.py: truncate artifact text to 200 lines server-side - proof_v2.py: use get_run() instead of list_runs() scan in evidence endpoint - [req_id]/page.tsx: filter latestEvidence by req_id (was returning all gates) - proof/page.tsx: fetch and render selected run evidence via GateEvidencePanel
Follow-up ReviewTwo items from my prior review are resolved:
The main blocker and a couple of smaller issues remain. Security: Path traversal still unaddressed (blocker)
def _read_artifact_text(artifact_path: str, workspace: Workspace, max_lines: int = _ARTIFACT_LINE_LIMIT) -> Optional[str]:
try:
p = Path(artifact_path).resolve()
root = Path(workspace.repo_path).resolve()
if not str(p).startswith(str(root)):
return None
...Pass Also: Inline imports These remain inside function bodies:
Move to module-level. The
The prop interface declares Security fix is the only hard blocker. The other two are quick cleanups worth doing before merge. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@codeframe/core/proof/runner.py`:
- Around line 83-93: The run history is misattributed because triggered_by is
hardcoded to "human"; update the run_proof function signature to accept a
triggered_by parameter (default "human") and use that parameter when
constructing both ProofRun instances (the one created in run_proof and the
second one later around the other ProofRun(...) construction), ensuring the
passed-in value is threaded into ledger.save_run calls; also update all call
sites of run_proof to pass an appropriate triggered_by value where automation
may invoke it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 39d37b1f-49b1-4180-8517-508c408c735f
📒 Files selected for processing (5)
codeframe/core/proof/ledger.pycodeframe/core/proof/runner.pycodeframe/ui/routers/proof_v2.pyweb-ui/src/app/proof/[req_id]/page.tsxweb-ui/src/app/proof/page.tsx
✅ Files skipped from review due to trivial changes (1)
- codeframe/ui/routers/proof_v2.py
🚧 Files skipped from review as they are similar to previous changes (2)
- web-ui/src/app/proof/page.tsx
- codeframe/core/proof/ledger.py
| ledger.save_run( | ||
| workspace, | ||
| ProofRun( | ||
| run_id=run_id, | ||
| workspace_id=workspace.id, | ||
| started_at=started_at, | ||
| completed_at=completed_at, | ||
| triggered_by="human", | ||
| overall_passed=True, | ||
| duration_ms=int((completed_at - started_at).total_seconds() * 1000), | ||
| ), |
There was a problem hiding this comment.
triggered_by is hardcoded to "human" in both persisted run paths.
If run_proof is ever invoked by automation, run history will be misattributed. Consider threading a triggered_by parameter through run_proof and using it in both ProofRun(...) constructions (Line 90 and Line 157).
Suggested fix
def run_proof(
workspace: Workspace,
*,
full: bool = False,
gate_filter: Optional[Gate] = None,
run_id: Optional[str] = None,
+ triggered_by: str = "human",
) -> dict[str, list[tuple[Gate, bool]]]:
@@
ProofRun(
run_id=run_id,
workspace_id=workspace.id,
started_at=started_at,
completed_at=completed_at,
- triggered_by="human",
+ triggered_by=triggered_by,
overall_passed=True,
duration_ms=int((completed_at - started_at).total_seconds() * 1000),
),
@@
ProofRun(
run_id=run_id,
workspace_id=workspace.id,
started_at=started_at,
completed_at=completed_at,
- triggered_by="human",
+ triggered_by=triggered_by,
overall_passed=overall_passed,
duration_ms=duration_ms,
),Also applies to: 150-160
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@codeframe/core/proof/runner.py` around lines 83 - 93, The run history is
misattributed because triggered_by is hardcoded to "human"; update the run_proof
function signature to accept a triggered_by parameter (default "human") and use
that parameter when constructing both ProofRun instances (the one created in
run_proof and the second one later around the other ProofRun(...) construction),
ensuring the passed-in value is threaded into ledger.save_run calls; also update
all call sites of run_proof to pass an appropriate triggered_by value where
automation may invoke it.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/PHASE_3_UI_ARCHITECTURE.md`:
- Around line 362-364: Update the docs so GateEvidencePanel is described as
rendering per-gate evidence rows (pass/fail + artifact text) rather than hosting
run-state progress; remove or relocate mention of GateProgressRow (pending →
running → passed/failed) from GateEvidencePanel and assign that responsibility
to the run-progress components (e.g., GateRunPanel and GateRunBanner), ensuring
the doc now states GateRunPanel/GateRunBanner own progress-state UI while
GateEvidencePanel owns evidence/artifact display.
- Line 353: The fenced code block that documents the component tree (starting
with "ProofPage (/proof)" and containing nodes like ProofHeader and
RunGatesButton) is missing a language tag and triggers markdownlint MD040;
update the opening fence to include a language (e.g., change "```" to "```text")
so the tree block is recognized as plain text, leaving the block content and
closing "```" unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4ac59335-64db-4149-af0d-961a02b180d4
📒 Files selected for processing (4)
CLAUDE.mddocs/PHASE_2_CLI_API_MAPPING.mddocs/PHASE_3_UI_ARCHITECTURE.mddocs/PRODUCT_ROADMAP.md
✅ Files skipped from review due to trivial changes (3)
- docs/PRODUCT_ROADMAP.md
- docs/PHASE_2_CLI_API_MAPPING.md
- CLAUDE.md
| - Run history panel with outcome and duration | ||
|
|
||
| **Component Hierarchy:** | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to the fenced block to satisfy markdownlint.
Line 353 opens a fenced code block without a language, triggering MD040. Use text (or another appropriate language) for the tree block.
Proposed fix
-```
+```text
ProofPage (/proof)
├── ProofHeader
│ ├── RunGatesButton → POST /api/v2/proof/run
...
-```
+```🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 353-353: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/PHASE_3_UI_ARCHITECTURE.md` at line 353, The fenced code block that
documents the component tree (starting with "ProofPage (/proof)" and containing
nodes like ProofHeader and RunGatesButton) is missing a language tag and
triggers markdownlint MD040; update the opening fence to include a language
(e.g., change "```" to "```text") so the tree block is recognized as plain text,
leaving the block content and closing "```" unchanged.
| ├── GateEvidencePanel ← new (Phase 3.5B) | ||
| │ ├── GateProgressRow[] (pending → running → passed/failed) | ||
| │ └── EvidenceArtifactDisplay (artifact text, scrollable) |
There was a problem hiding this comment.
Align GateEvidencePanel docs with actual component responsibilities.
Line 362–364 describes GateEvidencePanel as hosting GateProgressRow[] with pending → running → passed/failed, but current implementation of web-ui/src/components/proof/GateEvidencePanel.tsx renders per-gate evidence rows (pass/fail + artifact text) and not running-state progress. Please move progress-row ownership to the run-progress components (e.g., GateRunPanel/GateRunBanner) in this doc to avoid architectural drift.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/PHASE_3_UI_ARCHITECTURE.md` around lines 362 - 364, Update the docs so
GateEvidencePanel is described as rendering per-gate evidence rows (pass/fail +
artifact text) rather than hosting run-state progress; remove or relocate
mention of GateProgressRow (pending → running → passed/failed) from
GateEvidencePanel and assign that responsibility to the run-progress components
(e.g., GateRunPanel and GateRunBanner), ensuring the doc now states
GateRunPanel/GateRunBanner own progress-state UI while GateEvidencePanel owns
evidence/artifact display.
Review: feat(web-ui): per-gate evidence display and run history (#567)This closes Phase 3.5B cleanly. The full stack is coherent — DB persistence, two new API endpoints, two new React components, and solid test coverage (14 new backend tests, 16 new frontend tests). The component design for I've reviewed the latest commit against both the code and CodeRabbit's prior passes. Here's what I think is still worth addressing. Security — path traversal in
|
Summary
Closes #567
proof_runstable added to SQLite ledger;save_run,list_runs, andget_run_evidencefunctions populate and query it.runner.pynow records start/end time, duration, and pass/fail for each run.GET /api/v2/proof/runs(list last N runs) andGET /api/v2/proof/runs/{run_id}/evidence(per-gate evidence with artifact file content).GateEvidencePanel: expandable per-gate row showing raw artifact text in a scrollable<pre>block; truncates at 200 lines with "Show full output" toggle.RunHistoryPanel: "Recent Runs" table showing last 5 runs with timestamp, pass/fail badge, duration, triggered-by; clicking a row sets the selected run.RunHistoryPanel) and the requirement detail page (GateEvidencePanelfor latest run evidence).Test plan
uv run pytest tests/ui/test_proof_v2.py— 47 tests, all pass (14 new)cd web-ui && npm test— 703 tests, all pass (16 new across 2 new files)cd web-ui && npm run build— clean build, no TypeScript errorsuv run ruff check— no lint issuesSummary by CodeRabbit
New Features
API
Tests
Docs