Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 32 additions & 25 deletions docs/mcp/claude.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,26 +6,27 @@ title: Claude - MCP integration

[Claude](https://claude.ai/) is available as a desktop app and as [Claude Code](https://docs.anthropic.com/en/docs/claude-code), Anthropic's CLI tool. Both support MCP and are configured the same way.

PlanExe turns a plain-English goal into a strategic project-plan draft (20+ sections) in ~10-20 minutes. The output is a self-contained interactive HTML report you open in a browser.

## Prerequisites

- Claude desktop app or Claude Code installed.
- PlanExe MCP server reachable by Claude.
- One of the following:
- An API key from [home.planexe.org](https://home.planexe.org/) (cloud server, no installation needed).
- PlanExe running locally via Docker (`docker compose up`, port 8001).

## Quick setup

1. Configure MCP in Claude (see options below).
2. Ask for prompt examples.
3. Create a plan and download the report.

## Sample prompt

> Get example prompts for creating a plan.
1. Configure MCP in Claude (see connection options below).
2. Verify the connection with `/mcp` in Claude Code or Settings > MCP in the desktop app.
3. Ask Claude to create a plan — it handles the full workflow (prompt drafting, creation, status polling, download).

## Success criteria

- You can fetch prompt examples.
- You can create a plan.
- You can download the report.
- `/mcp` (Claude Code) or Settings > MCP (desktop) shows `planexe` as connected.
- You can fetch prompt examples (`example_prompts`).
- You can create a plan (`plan_create`) and poll it to completion (`plan_status`).
- You can download the report (`plan_file_info` or `plan_download`).

---

Expand All @@ -44,9 +45,9 @@ Run this command in your terminal:

```bash
claude mcp add --transport http \
--header "X-API-Key: pex_YOUR_API_KEY" \
planexe \
https://mcp.planexe.org/mcp
https://mcp.planexe.org/mcp \
--header "X-API-Key: pex_YOUR_API_KEY"
```

Replace `pex_YOUR_API_KEY` with your actual API key.
Expand All @@ -55,7 +56,7 @@ Replace `pex_YOUR_API_KEY` with your actual API key.

Start Claude and check that the server is connected.

In Claude Code, type `/mcp` to see the server status. In the Claude desktop app, go to Settings and check the MCP section. You should see `planexe` listed with its tools.
In Claude Code, type `/mcp` to see the server status. In the Claude desktop app, go to Settings and check the MCP section. You should see `planexe` listed with its tools (`example_plans`, `example_prompts`, `model_profiles`, `plan_create`, `plan_status`, `plan_stop`, `plan_retry`, `plan_file_info`, `plan_list`).

---

Expand Down Expand Up @@ -87,7 +88,7 @@ Authentication is disabled by default for local Docker (`PLANEXE_MCP_REQUIRE_AUT

In Claude Code, type `/mcp` to see the server status. In the Claude desktop app, check Settings > MCP.

> **Note:** With this option, `plan_file_info` returns a download URL. Claude can fetch the URL content for you, or you can open the URL in your browser.
> **Note:** With this option, `plan_file_info` returns a `download_url`. Ask Claude to fetch it, or open the URL in your browser. For local disk saves, use Option C instead (adds the `plan_download` tool).

---

Expand Down Expand Up @@ -214,20 +215,26 @@ claude mcp remove planexe

## Interaction

My interaction with Claude for creating a plan is like this:
A typical conversation for creating a plan looks like this:

1. **Explore** — "Tell me about the PlanExe MCP tools you have access to."
2. **Get examples** — "Get the prompt examples."
3. **Describe your goal** — "I want a prompt about building a community solar farm in rural Denmark."
Claude drafts a detailed prompt (~300-800 words) based on the examples and your idea.
4. **Approve and create** — "Go ahead, create this plan."
Claude calls `plan_create`, which returns a `plan_id`.
5. **Wait** — Plan generation takes ~10-20 minutes. Claude polls `plan_status` automatically every few minutes. Alternatively, `plan_create` returns an `sse_url` — a GET endpoint (text/event-stream) that streams real-time progress events until the plan completes. Claude Code agents can run `curl -N <sse_url>` in a background shell to monitor progress instead of polling.
6. **Download** — "Download the report." Claude fetches the HTML report via `plan_file_info` (cloud) or `plan_download` (local proxy).

1. tell me about the planexe mcp tool you have access to
2. get the prompt examples
3. I want a prompt about building a community solar farm in rural Denmark
4. go ahead create this plan
5. *wait for 10-20 minutes, Claude polls status automatically*
6. download the report
If a plan fails, Claude can retry it with `plan_retry`. If a `plan_id` is lost, `plan_list` recovers recent plans.

---

## Troubleshooting

- If `/mcp` shows the server as disconnected, check that Docker is running (`docker compose ps`) or that `mcp.planexe.org` is reachable.
- If you get authentication errors with the cloud server, verify your API key at [home.planexe.org](https://home.planexe.org/).
- For stdio transport issues, make sure `uv` is installed and on your PATH.
- **Server disconnected**: `/mcp` shows disconnected — check that Docker is running (`docker compose ps`) or that `mcp.planexe.org` is reachable.
- **Authentication errors**: Verify your API key at [home.planexe.org](https://home.planexe.org/). Keys are prefixed with `pex_`.
- **stdio transport**: Make sure `uv` is installed and on your PATH.
- **Plan stuck in pending**: If `plan_status` stays `pending` for >5 minutes, the worker likely hasn't picked it up. Check Docker logs or report the issue.
- **Plan failed**: Call `plan_retry` to requeue the same `plan_id` (defaults to baseline profile).
- For more help, see the [Troubleshooting guide](mcp_troubleshooting.md) or ask on the [PlanExe Discord](https://planexe.org/discord).
54 changes: 47 additions & 7 deletions docs/proposals/70-mcp-interface-evaluation-and-roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ An honest audit of the current MCP surface (`mcp_cloud` + `mcp_local`), followed
- **2026-02-26 (rev 3):** Updated after completing 4.9 — all stale `task` variable names, request classes, helper functions, and backward-compat aliases renamed/removed across `mcp_cloud` and `mcp_local`. Test files renamed from `test_task_`* to `test_plan_*`.
- **2026-02-26 (rev 4):** Updated after completing 4.2 — added separate download rate limiter with configurable limits (default 10 req/60s).
- **2026-02-26 (rev 5):** Renamed external-facing fields: `task_id` → `plan_id`, `tasks` → `plans`, error codes `TASK_NOT_FOUND` → `PLAN_NOT_FOUND`, `TASK_NOT_FAILED` → `PLAN_NOT_FAILED`. Internal function names and download URL paths unchanged.
- **2026-03-02 (rev 6):** SSE progress streaming implemented (`mcp_cloud/sse.py`, `GET /sse/plan/{plan_id}`). Incorporated feedback from Claude Code agent evaluation of the MCP interface: improved SSE documentation across server instructions, tool descriptions, and field descriptions to be actionable for agents; added new proposed improvements for credit feedback, pipeline stage names, and files array completeness. Discovered and fixed BaseHTTPMiddleware blocking issue with SSE streams.

---

Expand Down Expand Up @@ -72,6 +73,12 @@ Nine tools, split across two transports:

**Comprehensive test suite.** 12 test files covering tool surface consistency, auth key parsing, CORS config, download tokens, HTTP routing, and individual tool behaviour (`test_plan_create_tool.py`, `test_plan_status_tool.py`, `test_plan_retry_tool.py`, `test_plan_file_info_tool.py`, `test_model_profiles_tool.py`).

**SSE progress streaming.** `GET /sse/plan/{plan_id}` streams real-time progress as Server-Sent Events (`text/event-stream`). Emits `status` events on state/progress changes, `heartbeat` every ~20s, and a final `complete` event on terminal state. Deduplication avoids sending unchanged state. Connection tracking enforces per-client (5) and server-wide (200) limits. SSE is complementary to polling — both are fully supported. The SSE endpoint bypasses `BaseHTTPMiddleware` and handles auth inline to avoid a Starlette bug where long-lived streaming responses block concurrent requests through the middleware.

**Actionable SSE documentation.** Server instructions, tool descriptions (`plan_create`, `plan_status`), and field descriptions (`sse_url`) explain SSE in concrete terms: it's a GET endpoint returning `text/event-stream`, usable with `curl -N -H 'X-API-Key: <key>' <sse_url>`, and auto-closes on terminal state. This was revised based on feedback from a Claude Code agent evaluation that found the original "if your client supports SSE" phrasing too vague for autonomous agents.

**`example_plans` tool.** Returns curated example plans with download links for reports and zip bundles. Lets users preview what PlanExe output looks like before committing to a plan. No API key required.

---

## 3. What's Been Fixed (Previously Reported)
Expand Down Expand Up @@ -164,9 +171,13 @@ All internal naming now uses `plan` consistently. Request classes renamed (`Task

## 5. Proposed Improvements

### 5.1 SSE progress streaming (UX)
### ~~5.1 SSE progress streaming (UX)~~ (IMPLEMENTED)

`GET /sse/plan/{plan_id}` now streams real-time progress via Server-Sent Events. Implementation in `mcp_cloud/sse.py`. `plan_create` and `plan_status` responses include an `sse_url` field pointing to the endpoint. Events: `status` (state/progress changes), `heartbeat` (~20s silence), `complete` (terminal state), `error` (not found / timeout). Stream auto-closes on terminal state or after 60 minutes.

**Feedback note (Claude Code agent evaluation, 2026-03-02):** The original documentation said "if your client supports Server-Sent Events" — this was too vague for autonomous agents. MCP tools are request-response by design, so agents didn't know whether they "support" SSE or how to consume it. Documentation was rewritten to be actionable: SSE is a GET endpoint returning `text/event-stream`, agents can use `curl -N` via their Bash tool, and polling remains the simpler alternative. Both `plan_create` and `plan_status` tool descriptions now mention `sse_url` with usage examples.

Long-running plans (10–20 minutes) give the user no feedback. A `log_lines` array in the `plan_status` response (last 50 lines of agent output) would dramatically improve perceived responsiveness.
**Implementation note:** The SSE endpoint is intentionally excluded from Starlette's `BaseHTTPMiddleware` (`enforce_api_key`). The middleware pipes response bodies through an internal `anyio.MemoryObjectStream`; for long-lived SSE streams this keeps the middleware's task-group alive indefinitely and starves concurrent requests. Auth is handled inline in the SSE endpoint instead.

### 5.2 Webhook / push notification (power users)

Expand All @@ -180,6 +191,30 @@ All tool names and schemas are currently unversioned. A future breaking change w

Add an explicit check at server startup that required secrets (`PLANEXE_API_KEY_SECRET`, `PLANEXE_DOWNLOAD_TOKEN_SECRET`) are set when auth is enabled. Fail loudly instead of falling back to dev defaults.

### 5.5 Credit/cost feedback in `plan_create` response

**Source:** Claude Code agent feedback (2026-03-02).

After `plan_create`, there is no indication of credits consumed or remaining. For a paid service, this transparency builds trust. Consider adding `credits_used` and `credits_remaining` fields to the `plan_create` response (or to `plan_status` on completion).

### 5.6 Pipeline stage names in progress reporting

**Source:** Claude Code agent feedback (2026-03-02).

`progress_percentage` jumps non-linearly (e.g. 80% → 83% over 4 minutes, then straight to 100%). The metric doesn't feel informative. Consider adding a `current_stage` field to `plan_status` (e.g. "generating SWOT analysis", "running premortem") so agents and users can see *what* is happening, not just a number.

### 5.7 Complete files array in `plan_status` for completed plans

**Source:** Claude Code agent feedback (2026-03-02).

The `files` array in `plan_status` only shows early pipeline files (001-, 002-), not the final report or zip. When `state` is `completed`, the full manifest of outputs should be visible. This helps agents verify what was produced before calling `plan_file_info`.

### 5.8 Prompt approval skip for agent-provided prompts

**Source:** Claude Code agent feedback (2026-03-02).

The server instructions mandate that the agent drafts a prompt and gets user approval before calling `plan_create`. When the user hands the agent a polished prompt file and says "use this", the extra ceremony adds friction. Consider a note in the instructions that agent-provided or user-approved prompts can go directly to `plan_create` without the full drafting cycle.

---

## 6. Promotion and Growth Strategies
Expand Down Expand Up @@ -280,15 +315,20 @@ Add 10–15 high-quality example prompts (startup, research paper, home renovati
| P2 | ~~Body size validation on Streamable HTTP (4.3)~~ | — | DONE |
| P2 | ~~Return error for invalid artifact value (4.4)~~ | — | DONE |
| P2 | ~~Add tool-call audit logging (4.7)~~ | — | DONE |
| P2 | Add `log_lines` to `plan_status` (5.1) | 4 h | |
| P1 | ~~SSE progress streaming (5.1)~~ | — | DONE |
| P2 | Add `log_lines` to `plan_status` | 4 h | |
| P2 | ~~Rename internal `task` variables/classes/helpers to `plan` (4.9)~~ | — | DONE |
| P2 | ~~Remove backward-compat `Task*`/`handle_task_*`/`TASK_*` aliases (4.9)~~ | — | DONE |
| P2 | ~~Rename test files from `test_task_*` to `test_plan_*` (4.9)~~ | — | DONE |
| P2 | ~~Tighten default CORS origins (4.6)~~ | — | DONE |
| P2 | ~~Align `plan_list` auth with `plan_create` (4.10)~~ | — | DONE |
| P3 | Webhook support (5.2) | 1 day | |
| P3 | API versioning (5.3) | 4 h | |
| P3 | GitHub Actions integration (6.3) | 1 day | |
| P2 | Credit/cost feedback in `plan_create` response (5.5) | 4 h | |
| P2 | Pipeline stage names in progress reporting (5.6) | 4 h | |
| P2 | Complete files array in `plan_status` for completed plans (5.7) | 2 h | |
| P3 | Prompt approval skip for agent-provided prompts (5.8) | 1 h | |
| P3 | Webhook support (5.2) | 1 day | |
| P3 | API versioning (5.3) | 4 h | |
| P3 | GitHub Actions integration (6.3) | 1 day | |


---
Expand All @@ -297,4 +337,4 @@ Add 10–15 high-quality example prompts (startup, research paper, home renovati

The MCP surface is functionally solid and ahead of most MCP servers in terms of schema rigour, annotation coverage, and security (signed download tokens, layered auth, auto-injected user keys). The codebase has been significantly improved since rev 1: `app.py` was refactored from a 76 KB monolith into 10+ focused modules, `plan_list` now follows the same auth-injection pattern as `plan_create`, and all P0 issues are resolved.

All P1 code-quality issues are now resolved, including fail-hard on missing secrets in production (4.1). The remaining checklist items are promotion/growth tasks (mcp.so submission, README demo) and lower-priority enhancements (CORS tightening, SSE streaming, webhooks, API versioning).
All P1 code-quality issues are now resolved, including fail-hard on missing secrets in production (4.1). SSE progress streaming (5.1) is now implemented, providing real-time push updates as an alternative to polling. A Claude Code agent evaluation (2026-03-02) surfaced actionable feedback: SSE documentation was too vague for autonomous agents (now fixed), and four new improvements were identified — credit/cost feedback (5.5), pipeline stage names in progress (5.6), complete files array on completion (5.7), and prompt approval flexibility (5.8). The remaining checklist items are these feedback-driven improvements, promotion/growth tasks (mcp.so submission, README demo), and lower-priority enhancements (webhooks, API versioning).
10 changes: 6 additions & 4 deletions mcp_cloud/db_setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,10 +97,12 @@ def ensure_planitem_stop_columns() -> None:
"Only after approval, call plan_create. "
"Each plan_create call creates a new plan_id; the server does not enforce a global per-client concurrency limit. "
"Then poll plan_status (about every 5 minutes); use plan_file_info when complete. "
"plan_create and plan_status responses include an sse_url field when available. "
"If your client supports Server-Sent Events, connect to sse_url with the same API key header "
"to receive real-time push updates instead of polling plan_status. "
"The SSE stream auto-closes when the plan completes or fails. SSE is optional — polling remains supported. "
"plan_create and plan_status responses include an sse_url field (a plain GET endpoint returning text/event-stream). "
"Instead of polling plan_status, you can monitor progress in real time by opening sse_url — "
"for example, run `curl -N -H 'X-API-Key: <key>' <sse_url>` in a background shell. "
"The stream emits 'status' events when progress changes, 'heartbeat' every ~20 s, and a final "
"'complete' event (state completed or failed) then closes automatically. "
"Polling plan_status and SSE are both supported — use whichever fits your runtime. "
"If a run fails, call plan_retry with the failed plan_id to requeue it (optional model_profile, defaults to baseline). "
"To stop, call plan_stop with the plan_id from plan_create; stopping is asynchronous and the plan will eventually transition to failed. "
"If model_profiles returns MODEL_PROFILES_UNAVAILABLE, inform the user that no models are currently configured and the server administrator needs to set up model profiles. "
Expand Down
Loading