Keep the meaning. Trim the spell.
English · 繁體中文 · Installation · Quick Start · Model Guidance
promptcrab is a CLI for rewriting prompts for downstream LLMs with lower token cost and strict fidelity checks.
Instead of simply shortening text, it generates multiple rewrite candidates, verifies that they preserve task meaning and ordering, checks protected literals such as URLs, IDs, keys, and numbers, and then returns the safest compact version.
Requires Python 3.12 or newer.
- Rewrites a prompt into compact
zh,wenyan, andencandidates - Optionally verifies each candidate with a dedicated judge backend
- Checks whether important literals were dropped
- Estimates token counts
- Picks the best valid candidate, or falls back to the original prompt
minimax: usesMINIMAX_API_KEYorOPENAI_API_KEYgemini: usesGEMINI_API_KEYgemini_cli: uses the localgeminiexecutable and its own login/sessioncodex_cli: uses the localcodexexecutable
If you are installing from a local checkout:
uv tool install .Or install into a virtual environment:
uv pip install .To see the available options:
promptcrab --helppromptcrab reads credentials in this order:
- CLI flags such as
--minimax-api-keyand--gemini-api-key - Existing shell environment variables
--env-file /path/to/file.env- A
.envfile found by searching from the current working directory upward
This makes local project .env files work even when promptcrab is installed globally.
Example:
MINIMAX_API_KEY=your-key
GEMINI_API_KEY=your-key
OPENAI_API_KEY=your-keyOnly set the variables required by the backend you actually use.
If you keep provider keys outside the project root, pass an explicit file:
promptcrab --env-file ~/.config/promptcrab/provider.env --helpRewrite a prompt with MiniMax through opencode:
promptcrab \
--backend opencode_cli \
--model minimax-coding-plan/MiniMax-M2.7-highspeed \
--prompt "Summarize this API design and keep every field name unchanged."Rewrite a prompt from a file with the local Gemini CLI:
promptcrab \
--backend gemini_cli \
--model gemini-3-flash-preview \
--prompt-file ./prompt.txtUse a fixed judge backend instead of self-verification:
promptcrab \
--backend opencode_cli \
--model minimax-coding-plan/MiniMax-M2.7-highspeed \
--judge-backend codex_cli \
--judge-model gpt-5.4 \
--judge-codex-reasoning-effort medium \
--prompt-file ./prompt.txtRewrite a prompt with the local Gemini CLI:
promptcrab \
--backend gemini_cli \
--model gemini-3-flash-preview \
--prompt-file ./prompt.txtPipe a prompt through stdin:
cat ./prompt.txt | promptcrab --backend codex_cli --model gpt-5.4Show every candidate and its checks:
promptcrab \
--backend opencode_cli \
--model minimax-coding-plan/MiniMax-M2.7-highspeed \
--prompt-file ./prompt.txt \
--show-allReturn machine-readable JSON:
promptcrab \
--backend gemini_cli \
--model gemini-3-flash-preview \
--prompt-file ./prompt.txt \
--json-outputWrite the best prompt to a file:
promptcrab \
--backend opencode_cli \
--model minimax-coding-plan/MiniMax-M2.7-highspeed \
--prompt-file ./prompt.txt \
--write-best-to ./optimized.txtOptionally cap generation output if a specific provider/model needs it:
promptcrab \
--backend gemini \
--model gemini-3-flash-preview \
--prompt-file ./prompt.txt \
--max-output-tokens 4096Use a non-default Codex executable path:
promptcrab \
--backend codex_cli \
--model gpt-5.4 \
--codex-executable /path/to/codex \
--prompt-file ./prompt.txtInstead of checking in a small, stale benchmark table, promptcrab now ships a reproducible promptcrab-benchmark runner. It runs a built-in literal/format hard-case suite, pulls public web datasets, re-counts every prompt with one shared tokenizer, and evaluates rewrites with a multi-judge panel.
This single-judge snapshot was run on 2026-04-15 for a README-sized comparison that finishes quickly. It samples 4 MT-Bench cases and 4 IFEval cases, uses o200k_base as the shared tokenizer, keeps literal checks enabled, and evaluates every row with codex_cli + gpt-5.4 (medium) as the judge. Treat it as directional, not a final ranking; the GPT row is self-judged.
Avg accepted token reduction is computed only over cases where at least one candidate passed the fidelity gates.
| Rewrite backend | Judge | Sample | Pass rate (95% CI) | Avg accepted token reduction (95% CI) | Dataset pass split | Notes |
|---|---|---|---|---|---|---|
codex_cli + gpt-5.4 (medium) |
codex_cli + gpt-5.4 (medium) |
8 |
6/8 = 75.0% (40.9-92.9%) |
4.8% (-5.5-12.3%) |
MT-Bench 4/4, IFEval 2/4 |
Self-judged; most conservative compression. IFEval failures came from strict literal/verbatim constraints. |
opencode_cli + MiniMax-M2.7-highspeed |
codex_cli + gpt-5.4 (medium) |
8 |
2/8 = 25.0% (7.1-59.1%) |
20.1% (19.2-20.9%) |
MT-Bench 2/4, IFEval 0/4 |
Highest accepted compression, but many IFEval cases failed on literal or format drift. |
gemini_cli + gemini-3-flash-preview |
codex_cli + gpt-5.4 (medium) |
8 |
4/8 = 50.0% (21.5-78.5%) |
7.8% (-16.7-26.3%) |
MT-Bench 3/4, IFEval 1/4 |
Middle fidelity; failures mostly came from translated or dropped literal constraints. |
Built-in prompt sources:
hard_cases: built-in literal and format preservation prompts covering verbatim repeat, bullet templates, exact markers, section separators, case/count constraints, symbols, JSON keys, and URLs- MT-Bench
- IFEval
The benchmark reports:
- per-judge pass rate with 95% Wilson confidence intervals
- panel consensus pass rate
- before-gate token reduction, showing how much the raw shortest candidate compressed before fidelity checks
- after-gate token reduction, showing accepted compression after literal and judge gates
- 95% bootstrap confidence intervals for mean token reduction
- pairwise judge agreement and Cohen's kappa
- per-dataset breakdowns
Example: rerun the benchmark on hard cases and public real-world cases
promptcrab-benchmark \
--backend codex_cli \
--model gpt-5.4 \
--codex-reasoning-effort medium \
--judge gemini_cli:gemini-3-flash-preview \
--judge opencode_cli:minimax-coding-plan/MiniMax-M2.7-highspeed \
--dataset hard_cases \
--dataset mt_bench \
--dataset ifeval \
--cases-per-dataset 24 \
--trials 2 \
--tokenizer o200k_baseIf you want to run the full datasets instead of a stratified sample:
promptcrab-benchmark \
--backend codex_cli \
--model gpt-5.4 \
--codex-reasoning-effort medium \
--judge gemini_cli:gemini-3-flash-preview \
--judge opencode_cli:minimax-coding-plan/MiniMax-M2.7-highspeed \
--dataset hard_cases \
--dataset mt_bench \
--dataset ifeval \
--cases-per-dataset 0 \
--tokenizer o200k_baseThe built-in hard_cases suite is always evaluated in full when selected; --cases-per-dataset only limits sampled external datasets.
Recommended starting points:
- For highest fidelity and stability, use
codex_cli --model gpt-5.4, optionally pin--codex-reasoning-effort medium|high|xhigh, and pick a different judge backend such asgemini_clioropencode_cli. - For strongest prompt compression, compare
opencode_cli --model minimax-coding-plan/MiniMax-M2.7-highspeedwithcodex_cli --model gpt-5.4as judge. - Use
gemini_cli --model gemini-3-flash-previewas a rewrite backend only if you want to compare it explicitly; current literal-fidelity performance is weaker thangpt-5.4in the directional snapshot above.
If you omit --judge-backend, promptcrab skips judge-based verification and only applies literal checks. This is faster, but less safe.
Example: safer default rewrite
promptcrab \
--backend codex_cli \
--model gpt-5.4 \
--codex-reasoning-effort medium \
--judge-backend gemini_cli \
--judge-model gemini-3-flash-preview \
--prompt-file ./prompt.txtExample: stronger compression with an external judge
promptcrab \
--backend opencode_cli \
--model minimax-coding-plan/MiniMax-M2.7-highspeed \
--judge-backend codex_cli \
--judge-model gpt-5.4 \
--judge-codex-reasoning-effort medium \
--prompt-file ./prompt.txtFor codex_cli, promptcrab can override reasoning effort with --codex-reasoning-effort and --judge-codex-reasoning-effort. If you omit those flags, Codex falls back to your local CLI configuration such as ~/.codex/config.toml.
- Default output: prints the selected best prompt
--show-all: prints all candidates, checks, and verifier results--json-output: prints a JSON object for automation--write-best-to: saves the selected prompt to a file
- If no candidate passes the fidelity gates,
promptcrabreturns the original prompt unchanged. - If you set
--judge-backend, promptcrab runs an extra verification pass before accepting a candidate. - If you omit
--judge-backend, promptcrab skips semantic verification and only uses literal checks. - If you want a truly independent judge, set
--judge-backendto a different backend than--backend. promptcrabdoes not set a generation output cap by default; if you need one for a specific backend or model, pass--max-output-tokens.--max-output-tokensis currently forwarded tominimaxandgemini;codex_cliandgemini_clido not expose a matching flag in this wrapper yet.- Token counting depends on backend support and available credentials.
- The selected best candidate is language-agnostic; whichever valid rewrite is smallest wins.
See CHANGELOG.md.
