Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
121 commits
Select commit Hold shift + click to select a range
03807e4
docs: map existing codebase
jerry609 Mar 15, 2026
911f50c
docs(08): research phase agent event vocabulary
jerry609 Mar 15, 2026
35f81d0
docs(phase-8): add research and validation strategy
jerry609 Mar 15, 2026
5435267
docs(08): create phase plan for Agent Event Vocabulary
jerry609 Mar 15, 2026
4639a7e
test(08-01): add failing tests for event vocabulary and helpers
jerry609 Mar 15, 2026
4ed4b36
feat(08-01): add EventType vocabulary, lifecycle/tool-call helpers, m…
jerry609 Mar 15, 2026
4737c4e
feat(08-02): add TypeScript types, parsers, and Zustand store with tests
jerry609 Mar 15, 2026
effc29c
docs(08-01): complete agent-event-vocabulary plan
jerry609 Mar 15, 2026
48415b8
docs: start milestone v1.2 DeepCode Agent Dashboard
jerry609 Mar 15, 2026
bdc987f
feat(08-02): add SSE hook, display components, and test harness page
jerry609 Mar 15, 2026
28abfa5
docs(08-02): complete frontend agent event consumer layer plan
jerry609 Mar 15, 2026
8b5e119
docs(phase-08): complete phase execution
jerry609 Mar 15, 2026
e209e98
docs(phase-09): research three-panel dashboard
jerry609 Mar 15, 2026
561c099
docs: complete project research
jerry609 Mar 15, 2026
6055dbb
docs(phase-09): add validation strategy
jerry609 Mar 15, 2026
e2ca1d9
docs(09): create phase plan for three-panel dashboard
jerry609 Mar 15, 2026
4252482
feat(09-01): add FileTouchedEntry type, parseFileTouched parser, and …
jerry609 Mar 15, 2026
f6b837d
feat(09-01): add EventType.FILE_CHANGE to Python schema and wire SSE …
jerry609 Mar 15, 2026
08dca86
docs(09-01): complete file-tracking data layer plan
jerry609 Mar 15, 2026
7cdc641
feat(09-02): add TasksPanel, FileListPanel, InlineDiffPanel + AgentSt…
jerry609 Mar 15, 2026
2588799
docs: define milestone v1.2 requirements
jerry609 Mar 15, 2026
7a5c7ef
feat(09-02): create /agent-dashboard page with SplitPanels and add si…
jerry609 Mar 15, 2026
0cc26f0
docs(09-02): complete three-panel agent dashboard plan
jerry609 Mar 15, 2026
e9f8e56
docs(09-02): mark human-verify checkpoint approved, finalize plan sum…
jerry609 Mar 15, 2026
2fb0f53
docs: create milestone v1.2 roadmap (6 phases)
jerry609 Mar 15, 2026
da85e0d
docs(phase-09): complete phase execution
jerry609 Mar 15, 2026
39cc9ca
docs(phase-10): research agent board kanban and codex bridge
jerry609 Mar 15, 2026
151ff6c
docs(phase-10): add validation strategy
jerry609 Mar 15, 2026
0fc81d9
docs(10): create phase plan — 3 plans in 2 waves
jerry609 Mar 15, 2026
be5d5e4
test(10-01): add failing tests for CODEX_* EventType constants, overf…
jerry609 Mar 15, 2026
0754e4c
feat(10-02): add CodexDelegationEntry type, parseCodexDelegation pars…
jerry609 Mar 15, 2026
4a3ec76
feat(10-01): add CODEX_* EventType constants, _emit_codex_event helpe…
jerry609 Mar 15, 2026
c681388
docs(10-01): complete Codex delegation events plan summary
jerry609 Mar 15, 2026
2c64f4e
feat(10-02): add KanbanBoard component with agent identity badges and…
jerry609 Mar 15, 2026
865f5ca
docs(10-02): complete KanbanBoard and Codex delegation events plan
jerry609 Mar 15, 2026
3375979
feat(10-03): add codex-worker sub-agent and dashboard Panels/Kanban t…
jerry609 Mar 15, 2026
c8388fc
docs(10-03): complete codex-worker sub-agent and dashboard view toggl…
jerry609 Mar 15, 2026
02f0a39
docs(11): research DAG visualization phase
jerry609 Mar 15, 2026
8bc1aa8
docs(phase-11): add research and validation strategy
jerry609 Mar 15, 2026
2c00df6
docs(11-dag-visualization): create phase plan
jerry609 Mar 15, 2026
8d85427
feat(11-01): add ScoreEdgeEntry types, parseScoreEdge parser, addScor…
jerry609 Mar 15, 2026
354de12
feat(11-01): add DAG node/edge builder functions with TDD
jerry609 Mar 15, 2026
7fbcaf6
docs(11-01): complete DAG data layer plan summary and state updates
jerry609 Mar 15, 2026
62e84c5
feat(11-02): wire AgentDagPanel into agent-dashboard as third view mode
jerry609 Mar 15, 2026
b6d8e53
feat: unify studio agent workspace
jerry609 Mar 15, 2026
4687d27
feat(web): tighten studio console and board workspace
jerry609 Mar 15, 2026
0963861
fix(web): soften studio workspace palette
jerry609 Mar 15, 2026
054cd8a
fix(web): simplify studio center surface
jerry609 Mar 15, 2026
0de2269
fix(web): move studio activity stats to monitor
jerry609 Mar 15, 2026
1afaec5
fix(web): move studio summary into left rail
jerry609 Mar 15, 2026
2415ada
fix(web): compact studio left rail summary
jerry609 Mar 15, 2026
d3a5880
feat: wire studio runtime and codex delegation
jerry609 Mar 15, 2026
8f3b5ce
fix: align studio chat with claude code cli
jerry609 Mar 15, 2026
f48ac09
feat: add claude code command runner
jerry609 Mar 15, 2026
7912ee5
refactor: inline studio command runner
jerry609 Mar 16, 2026
0801caf
feat: align studio composer with codepilot
jerry609 Mar 16, 2026
9885369
chore: patch next swc lockfile
jerry609 Mar 16, 2026
3b81acf
feat: support cursor-based studio slash commands
jerry609 Mar 16, 2026
a94e986
feat: streamline studio claude code chat ui
jerry609 Mar 16, 2026
570d1e0
fix: align studio shell with codepilot
jerry609 Mar 16, 2026
b2bd6a1
feat: align studio chat with codepilot patterns
jerry609 Mar 16, 2026
ef034ed
fix: stream studio command output in chat timeline
jerry609 Mar 16, 2026
51eaf39
fix: render studio outputs as markdown in chat
jerry609 Mar 16, 2026
7bbb421
fix: allow studio workspaces under home documents
jerry609 Mar 16, 2026
681d6f2
fix: tighten studio slash and workspace interactions
jerry609 Mar 16, 2026
42252eb
fix: align studio mode and workspace preflight
jerry609 Mar 16, 2026
6f427a1
feat: tighten studio chat telemetry and runtime defaults
jerry609 Mar 16, 2026
6fa3d81
feat: inspect studio worker runs
jerry609 Mar 16, 2026
4098c2c
refactor: unify studio monitor surface
jerry609 Mar 16, 2026
eeccea1
feat: add studio worker drill-in
jerry609 Mar 16, 2026
895b3e4
feat: link chat summaries to workers
jerry609 Mar 16, 2026
2232c36
feat: expose studio worker bridge readiness
jerry609 Mar 16, 2026
588a733
feat: add studio approval resume flow
jerry609 Mar 16, 2026
c050bbb
feat: add structured studio bridge results
jerry609 Mar 17, 2026
ac9a76d
feat: link studio bridge results to monitor runs
jerry609 Mar 17, 2026
05fb08d
feat: add managed worker session controls
jerry609 Mar 17, 2026
b61d41e
feat: connect worker details to chat threads
jerry609 Mar 17, 2026
19331c5
feat: deepen studio worker navigation
jerry609 Mar 17, 2026
3032eb8
feat: focus monitor tabs on selected worker
jerry609 Mar 17, 2026
03e048a
feat: clarify mirrored worker controls
jerry609 Mar 17, 2026
efb05ca
feat: refine worker control panel visuals
jerry609 Mar 17, 2026
635d140
feat: add worker monitor segmented controls
jerry609 Mar 17, 2026
720ecf2
feat: refine worker session summaries
jerry609 Mar 17, 2026
9886b00
feat: compress worker list into session rail
jerry609 Mar 17, 2026
6886691
feat: compress monitor right rail
jerry609 Mar 17, 2026
0bf9e96
feat: tighten monitor header density
jerry609 Mar 17, 2026
569f914
feat: tighten studio chat timeline density
jerry609 Mar 17, 2026
8464841
feat: refine studio slash palette interactions
jerry609 Mar 17, 2026
c7d9a23
feat: compress studio composer controls
jerry609 Mar 17, 2026
52c3a29
feat: refine studio thread rail and empty state
jerry609 Mar 17, 2026
80c744a
feat: add collapsible studio activity streams
jerry609 Mar 17, 2026
a83833b
feat: auto-collapse completed studio activity
jerry609 Mar 17, 2026
6fa1d37
feat: polish live studio activity disclosures
jerry609 Mar 17, 2026
dd199d2
feat: strengthen studio chat hierarchy
jerry609 Mar 17, 2026
f6a0a39
feat: add studio activity stage timeline
jerry609 Mar 17, 2026
ab89733
feat: refine workspace setup review flow
jerry609 Mar 17, 2026
dcfc8f6
feat: refine studio chat launch surface
jerry609 Mar 17, 2026
0e405ab
feat: align studio auth and paper entry flow
jerry609 Mar 17, 2026
f76055f
feat: unify studio auth surface flow
jerry609 Mar 17, 2026
f1d9151
feat: streamline studio paper selection flow
jerry609 Mar 17, 2026
e0d086f
feat: tighten studio workspace hierarchy
jerry609 Mar 18, 2026
eea2b9a
feat: compress studio monitor surfaces
jerry609 Mar 18, 2026
3fa67b0
feat: tighten studio chat stream density
jerry609 Mar 18, 2026
deebd0d
feat: streamline studio composer chrome
jerry609 Mar 18, 2026
77f2831
feat: refine studio launch and toolbar density
jerry609 Mar 18, 2026
e487fd9
feat: polish studio navigation and slash feedback
jerry609 Mar 18, 2026
ce494b5
feat: refine studio slash insertion flow
jerry609 Mar 18, 2026
c94207e
feat: tighten studio chat surfaces
jerry609 Mar 18, 2026
b1b70e5
feat: finalize studio chat density
jerry609 Mar 18, 2026
309696f
feat: sync login page with dev auth ui
jerry609 Mar 18, 2026
6894e14
feat: refine studio chat and monitor workflow
jerry609 Mar 19, 2026
cd72002
feat: align studio skills with agent skill directories
jerry609 Mar 19, 2026
9eb2ee7
feat: streamline studio skill launch flow
jerry609 Mar 19, 2026
ba83816
refactor: remove standalone workflows surface
jerry609 Mar 19, 2026
4087126
feat: improve dashboard empty states
jerry609 Mar 19, 2026
bc98416
feat: add signals workspace and daily brief settings
jerry609 Mar 19, 2026
1051abf
fix: stabilize signals hydration and flatten queue layout
jerry609 Mar 19, 2026
7d8fd8b
fix: resolve studio sonar quality issues
jerry609 Mar 19, 2026
2b91ced
fix: address sonar security gate regressions
jerry609 Mar 19, 2026
71cc0cd
merge: resolve dev conflicts in layout shell
jerry609 Mar 19, 2026
38260e1
fix: resolve preview build and codeql issues
jerry609 Mar 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
241 changes: 241 additions & 0 deletions .claude/agents/codex-worker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
---
name: codex-worker
description: Delegates self-contained work to the Codex worker bridge and returns a structured result envelope that Claude can consume directly. Use this sub-agent for bounded code, review, research, planning, ops, or approval-handshake tasks.
tools: [Bash, Read]
---

# Codex Worker Sub-Agent

This sub-agent is the Claude-to-Codex bridge for PaperBot.

Its job is:

1. Validate that the requested task can be delegated safely.
2. Dispatch the task to the PaperBot Codex worker path when appropriate.
3. Return exactly one structured JSON result envelope.

Do not return free-form prose before or after the JSON. Claude should be able to treat your entire final answer as machine-readable output.

## Output Contract

Return exactly one JSON object with this schema:

```json
{
"version": "1",
"executor": "codex",
"task_kind": "code | review | research | plan | ops | approval_required | failure",
"status": "completed | partial | failed | approval_required",
"summary": "Short human-readable summary",
"artifacts": [
{
"kind": "file | command | url | finding | patch | note | other",
"label": "Short label",
"path": "optional/path/or/null",
"value": "optional/string/or/null"
}
],
"payload": {}
}
```

Rules:

- `version` must be `"1"`.
- `executor` must be `"codex"`.
- `summary` must be concise and factual.
- `artifacts` is for compact surfaced items the UI can badge or link.
- `payload` holds task-specific structured detail.
- If the task cannot complete, still return the same envelope with `task_kind: "failure"` or `status: "failed"`.
- If approval is required, return `task_kind: "approval_required"` and `status: "approval_required"`.
- Do not wrap the JSON in commentary such as "Here is the result".

## Task-Kind Guidance

Choose `task_kind` based on the primary user intent:

- `code`: implementation, refactor, bugfix, tests, generated files, patches
- `review`: code review findings, regressions, risk analysis
- `research`: investigation, repo mapping, fact gathering, comparisons
- `plan`: execution plan, milestone breakdown, sequencing
- `ops`: commands run, environment checks, service health, deployment/runtime work
- `approval_required`: a blocked command or action needs approval before continuing
- `failure`: the task failed before a useful result could be completed

## Recommended Payload Shapes

Use the smallest structured payload that matches the task.

### `code`

```json
{
"files_changed": ["web/src/lib/store/studio-store.ts"],
"files_created": ["web/src/lib/studio-bridge-result.ts"],
"tests_run": [
{ "command": "pytest tests/unit/test_studio_chat_telemetry.py -q", "status": "passed" }
],
"checks": [
{ "name": "eslint", "status": "passed" }
],
"notes": ["Structured bridge results now attach without overwriting raw tool output."]
}
```

### `review`

```json
{
"findings": [
{
"severity": "high",
"title": "Structured result is overwritten by plain text fallback",
"path": "web/src/components/studio/ReproductionLog.tsx",
"line": 1528,
"detail": "The bridge_result event replaces the raw tool_result instead of annotating it."
}
],
"risk_summary": "1 blocking issue, 1 medium issue"
}
```

### `research`

```json
{
"claims": [
{
"claim": "Claude can consume Codex bridge results directly through tool_result.",
"evidence": ["Observed worker tool_result returned to parent Claude session"]
}
],
"sources": [
{ "kind": "repo_file", "path": ".claude/agents/codex-worker.md" }
]
}
```

### `plan`

```json
{
"steps": [
"Normalize bridge results in the backend stream parser.",
"Patch chat store to merge bridge metadata onto raw tool results.",
"Render structured cards in Studio chat and keep Monitor as detailed view."
],
"acceptance_criteria": [
"Claude approval blocks can be resumed from Studio.",
"All worker results use the same JSON envelope."
]
}
```

### `ops`

```json
{
"commands": [
{ "command": "git branch --show-current", "status": "completed", "stdout_preview": "test/milestone-v1.2" }
],
"checks": [
{ "name": "backend", "status": "running" }
]
}
```

### `approval_required`

```json
{
"version": "1",
"executor": "codex",
"task_kind": "approval_required",
"status": "approval_required",
"summary": "Need approval to run a read-only git command.",
"artifacts": [
{
"kind": "command",
"label": "git branch",
"path": null,
"value": "git -C /home/master1/PaperBot branch --show-current"
}
],
"payload": {
"command": "git -C /home/master1/PaperBot branch --show-current",
"reason": "Permission gate",
"resume_hint": {
"worker_agent_id": "replace-with-actual-agent-id-if-known"
}
}
}
```

### `failure`

```json
{
"version": "1",
"executor": "codex",
"task_kind": "failure",
"status": "failed",
"summary": "Codex could not complete the task because the backend returned 500.",
"artifacts": [],
"payload": {
"reason_code": "backend_error",
"error": "HTTP 500 from /api/agent-board/tasks/dispatch",
"recommendation": "Retry after backend restart"
}
}
```

## Delegation Workflow

### Step 1

Confirm the referenced PaperBot task/session exists before dispatching.

```bash
SESSION_ID="<session_id>"
curl -s http://localhost:8000/api/agent-board/sessions/${SESSION_ID}
```

If the session or task cannot be found, return the JSON envelope with `task_kind: "failure"` and `status: "failed"`.

### Step 2

Dispatch the task to Codex.

```bash
TASK_ID="<task_id>"
curl -s -X POST http://localhost:8000/api/agent-board/tasks/${TASK_ID}/dispatch
```

If dispatch fails, return the JSON envelope with the failure details in `payload`.

### Step 3

Stream execution.

```bash
curl -s http://localhost:8000/api/agent-board/tasks/${TASK_ID}/execute
```

Watch for completion, failure, or approval-needed states. Convert the observed outcome into the structured envelope.

## Error Handling

Known failure examples:

- missing API key
- task/session not found
- worker timeout
- repeated tool failures
- backend 5xx
- permission/approval gate

In every case, do not switch formats. Return the same JSON envelope.

## Final Rule

Your final response must be JSON only.
49 changes: 38 additions & 11 deletions .planning/PROJECT.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

## What This Is

PaperBot is a multi-agent research workflow framework for academic paper discovery, analysis, and reproduction. It provides a FastAPI backend with SSE streaming, a Next.js web dashboard, and a terminal CLI. The platform is evolving toward a Skill-Driven Architecture where PaperBot acts as a capability provider, exposing paper-specific tools via MCP and providing an agent orchestration dashboard for Claude Code and Codex.
PaperBot is a multi-agent research workflow framework for academic paper discovery, analysis, and reproduction. It provides a FastAPI backend with SSE streaming, a Next.js web dashboard, and a terminal CLI. The platform follows a Skill-Driven Architecture where PaperBot acts as a capability provider, exposing paper-specific tools via MCP. The web dashboard (DeepCode) serves as an agent-agnostic visualization and control surface — proxying chat to whichever code agent the user configures (Claude Code, Codex, OpenCode, etc.) and displaying real-time agent activity, team decomposition, and file changes.

## Core Value

Paper-specific capability layer: understanding, reproduction, verification, and context — surfaced as standard MCP tools that any agent can consume, with a visual dashboard for agent orchestration.
Paper-specific capability layer: understanding, reproduction, verification, and context — surfaced as standard MCP tools that any agent can consume, with an agent-agnostic dashboard that visualizes and controls whatever code agent the user runs.

## Requirements

Expand All @@ -31,46 +31,68 @@ Paper-specific capability layer: understanding, reproduction, verification, and

### Active

<!-- Current scope: v1.1 Agent Orchestration Dashboard + v2.0 PG Migration -->
<!-- Current scope: v1.1 Agent Orchestration Dashboard + v1.2 DeepCode Agent Dashboard + v2.0 PG Migration -->

- [ ] Codex subagent bridge for Claude Code (custom agent definition)
- [ ] Agent orchestration dashboard (replaces studio page)
- [ ] Agent event logging via MCP (lifecycle, tool calls, file changes, task status)
- [ ] Three-panel IDE layout (tasks | agent activity | files)
- [ ] Live SSE streaming for real-time agent activity
- [ ] Paper2Code overflow delegation workflow (Claude Code → Codex)
- [ ] Agent-agnostic proxy layer (chat proxies to user-configured agent: Claude Code, Codex, OpenCode)
- [ ] Multi-agent adapter layer (unified interface for different code agents)
- [ ] Agent activity discovery (hybrid: agent pushes events + dashboard discovers independently)
- [ ] Team visualization (agent-initiated team decomposition reflected in dashboard)
- [ ] Dashboard control surface (send commands/tasks to agents from web UI)
- [ ] PostgreSQL migration (replace SQLite)
- [ ] Async data layer (AsyncSession + asyncpg)
- [ ] Systematic data model refactoring
- [ ] PG-native features (tsvector, JSONB)

### Out of Scope

- Custom agent orchestration runtime — host agents (Claude Code) own orchestration
- Per-host adapters — one MCP surface serves all
- Custom agent orchestration runtime — host agents own orchestration, PaperBot visualizes
- Building any code agent (Claude Code, Codex, OpenCode) — uses existing tools
- Business logic duplication — tools must reuse existing services
- Building Codex itself — uses existing Codex CLI
- Hardcoded agent pipeline logic — agent decides team composition and delegation
- Per-agent custom UI — one unified dashboard serves all agents

## Context

- Architecture pivot from AgentSwarm to Skill-Driven Architecture (2026-03-13)
- Existing `codex_dispatcher.py` and `claude_commander.py` in infrastructure/swarm/
- Further pivot: DeepCode as agent-agnostic dashboard, not Claude Code-specific (2026-03-15)
- Problem identified: chat mode split between Claude Code CLI connection vs direct API Codex calls — needs unification
- Existing `codex_dispatcher.py` and `claude_commander.py` in infrastructure/swarm/ — to be replaced by unified adapter
- Existing `AgentEventEnvelope` with run_id/trace_id/span_id in application/collaboration/
- Studio page exists with Monaco editor and XTerm terminal
- @xyflow/react already in web dashboard for DAG visualization
- MCP server (v1.0 milestone) is prerequisite — provides tool surface for agent integration
- MCP server (v1.0 milestone) provides tool surface for agent integration
- v1.1 EventBus + SSE foundation (phases 7-8) partially built
- Dev branch synced to origin/dev at 2e5173d (2026-03-14)
- Current DB: SQLite with 46 models, sync Session, FTS5 virtual tables, optional sqlite-vec

## Constraints

- **MCP prerequisite**: v1.0 MCP server must be functional before agent orchestration
- **Reuse**: Event logging must extend existing AgentEventEnvelope, not create parallel system
- **Claude Code bridge**: Codex integration is a Claude Code agent definition, not PaperBot server code
- **Agent-agnostic**: Dashboard must work with any code agent, not hardcode Claude Code or Codex specifics
- **No orchestration logic**: PaperBot does NOT decompose tasks — the host agent does; PaperBot visualizes
- **Studio integration**: Dashboard integrates with existing Monaco/XTerm, not replaces them
- **Transport**: SSE for live updates (existing infrastructure)

## Current Milestone: v1.1 Agent Orchestration Dashboard
## Current Milestone: v1.2 DeepCode Agent Dashboard

**Goal:** Unify the agent interaction model into a single agent-agnostic architecture where PaperBot's web UI (DeepCode) proxies chat to the user's chosen code agent, visualizes agent activity (teams, tasks, files) in real-time, and provides control commands — without hardcoding orchestration logic.

**Target features:**
- Agent-agnostic proxy layer (chat → Claude Code / Codex / OpenCode / etc.)
- Multi-agent adapter layer (unified interface abstracting agent-specific APIs/CLIs)
- Hybrid activity discovery (agent pushes events via MCP + dashboard discovers independently)
- Team visualization (agent-initiated team decomposition rendered in dashboard)
- Dashboard control surface (send commands/tasks back to agents)
- Real-time agent activity stream (builds on v1.1 EventBus/SSE)

## Previous Milestone: v1.1 Agent Orchestration Dashboard

**Goal:** Build a Codex subagent bridge for Claude Code and a real-time agent orchestration dashboard in PaperBot's web UI, enabling the Paper2Code overflow delegation workflow.

Expand Down Expand Up @@ -109,5 +131,10 @@ Paper-specific capability layer: understanding, reproduction, verification, and
| Systematic model refactoring | 46 models accumulated organically; normalize, add constraints, remove redundancy | — Pending |
| Docker PG for local dev | Standard dev setup, matches production topology | — Pending |

| DeepCode = agent-agnostic dashboard | Chat split (CLI vs API) was wrong; unify into proxy model where PaperBot doesn't care which agent | — Pending |
| Agent-initiated team decomposition | Agent decides how to split work; dashboard visualizes, doesn't orchestrate | — Pending |
| Hybrid activity discovery | Agent pushes structured events + dashboard can discover independently | — Pending |
| Dashboard + control (not pure display) | Users need to send commands/tasks, not just watch | — Pending |

---
*Last updated: 2026-03-14 after v2.0 milestone added*
*Last updated: 2026-03-15 after v1.2 milestone added*
Loading
Loading