promptcrab

Keep the meaning. Trim the spell.

English · 繁體中文 · Installation · Quick Start · Model Guidance

promptcrab is a CLI for rewriting prompts for downstream LLMs with lower token cost and strict fidelity checks.

Instead of simply shortening text, it generates multiple rewrite candidates, verifies that they preserve task meaning and ordering, checks protected literals such as URLs, IDs, keys, and numbers, and then returns the safest compact version.

Requires Python 3.12 or newer.

What It Does

Rewrites a prompt into compact zh, wenyan, and en candidates
Optionally verifies each candidate with a dedicated judge backend
Checks whether important literals were dropped
Estimates token counts
Picks the best valid candidate, or falls back to the original prompt

Supported Backends

minimax: uses MINIMAX_API_KEY or OPENAI_API_KEY
gemini: uses GEMINI_API_KEY
gemini_cli: uses the local gemini executable and its own login/session
codex_cli: uses the local codex executable

Installation

If you are installing from a local checkout:

uv tool install .

Or install into a virtual environment:

uv pip install .

To see the available options:

promptcrab --help

Configuration

promptcrab reads credentials in this order:

CLI flags such as --minimax-api-key and --gemini-api-key
Existing shell environment variables
--env-file /path/to/file.env
A .env file found by searching from the current working directory upward

This makes local project .env files work even when promptcrab is installed globally.

Example:

MINIMAX_API_KEY=your-key
GEMINI_API_KEY=your-key
OPENAI_API_KEY=your-key

Only set the variables required by the backend you actually use.

If you keep provider keys outside the project root, pass an explicit file:

promptcrab --env-file ~/.config/promptcrab/provider.env --help

Quick Start

Rewrite a prompt with MiniMax through opencode:

promptcrab \
  --backend opencode_cli \
  --model minimax-coding-plan/MiniMax-M2.7-highspeed \
  --prompt "Summarize this API design and keep every field name unchanged."

Rewrite a prompt from a file with the local Gemini CLI:

promptcrab \
  --backend gemini_cli \
  --model gemini-3-flash-preview \
  --prompt-file ./prompt.txt

Use a fixed judge backend instead of self-verification:

promptcrab \
  --backend opencode_cli \
  --model minimax-coding-plan/MiniMax-M2.7-highspeed \
  --judge-backend codex_cli \
  --judge-model gpt-5.4 \
  --judge-codex-reasoning-effort medium \
  --prompt-file ./prompt.txt

Rewrite a prompt with the local Gemini CLI:

promptcrab \
  --backend gemini_cli \
  --model gemini-3-flash-preview \
  --prompt-file ./prompt.txt

Pipe a prompt through stdin:

cat ./prompt.txt | promptcrab --backend codex_cli --model gpt-5.4

Common Usage

Show every candidate and its checks:

promptcrab \
  --backend opencode_cli \
  --model minimax-coding-plan/MiniMax-M2.7-highspeed \
  --prompt-file ./prompt.txt \
  --show-all

Return machine-readable JSON:

promptcrab \
  --backend gemini_cli \
  --model gemini-3-flash-preview \
  --prompt-file ./prompt.txt \
  --json-output

Write the best prompt to a file:

promptcrab \
  --backend opencode_cli \
  --model minimax-coding-plan/MiniMax-M2.7-highspeed \
  --prompt-file ./prompt.txt \
  --write-best-to ./optimized.txt

Optionally cap generation output if a specific provider/model needs it:

promptcrab \
  --backend gemini \
  --model gemini-3-flash-preview \
  --prompt-file ./prompt.txt \
  --max-output-tokens 4096

Use a non-default Codex executable path:

promptcrab \
  --backend codex_cli \
  --model gpt-5.4 \
  --codex-executable /path/to/codex \
  --prompt-file ./prompt.txt

Current Model Guidance

Instead of checking in a small, stale benchmark table, promptcrab now ships a reproducible promptcrab-benchmark runner. It runs a built-in literal/format hard-case suite, pulls public web datasets, re-counts every prompt with one shared tokenizer, and evaluates rewrites with a multi-judge panel.

Directional Snapshot

This single-judge snapshot was run on 2026-04-15 for a README-sized comparison that finishes quickly. It samples 4 MT-Bench cases and 4 IFEval cases, uses o200k_base as the shared tokenizer, keeps literal checks enabled, and evaluates every row with codex_cli + gpt-5.4 (medium) as the judge. Treat it as directional, not a final ranking; the GPT row is self-judged.

Avg accepted token reduction is computed only over cases where at least one candidate passed the fidelity gates.

Rewrite backend	Judge	Sample	Pass rate (95% CI)	Avg accepted token reduction (95% CI)	Dataset pass split	Notes
`codex_cli + gpt-5.4 (medium)`	`codex_cli + gpt-5.4 (medium)`	`8`	`6/8 = 75.0%` (`40.9-92.9%`)	`4.8%` (`-5.5-12.3%`)	MT-Bench `4/4`, IFEval `2/4`	Self-judged; most conservative compression. IFEval failures came from strict literal/verbatim constraints.
`opencode_cli + MiniMax-M2.7-highspeed`	`codex_cli + gpt-5.4 (medium)`	`8`	`2/8 = 25.0%` (`7.1-59.1%`)	`20.1%` (`19.2-20.9%`)	MT-Bench `2/4`, IFEval `0/4`	Highest accepted compression, but many IFEval cases failed on literal or format drift.
`gemini_cli + gemini-3-flash-preview`	`codex_cli + gpt-5.4 (medium)`	`8`	`4/8 = 50.0%` (`21.5-78.5%`)	`7.8%` (`-16.7-26.3%`)	MT-Bench `3/4`, IFEval `1/4`	Middle fidelity; failures mostly came from translated or dropped literal constraints.

Built-in prompt sources:

hard_cases: built-in literal and format preservation prompts covering verbatim repeat, bullet templates, exact markers, section separators, case/count constraints, symbols, JSON keys, and URLs
MT-Bench
IFEval

The benchmark reports:

per-judge pass rate with 95% Wilson confidence intervals
panel consensus pass rate
before-gate token reduction, showing how much the raw shortest candidate compressed before fidelity checks
after-gate token reduction, showing accepted compression after literal and judge gates
95% bootstrap confidence intervals for mean token reduction
pairwise judge agreement and Cohen's kappa
per-dataset breakdowns

Example: rerun the benchmark on hard cases and public real-world cases

promptcrab-benchmark \
  --backend codex_cli \
  --model gpt-5.4 \
  --codex-reasoning-effort medium \
  --judge gemini_cli:gemini-3-flash-preview \
  --judge opencode_cli:minimax-coding-plan/MiniMax-M2.7-highspeed \
  --dataset hard_cases \
  --dataset mt_bench \
  --dataset ifeval \
  --cases-per-dataset 24 \
  --trials 2 \
  --tokenizer o200k_base

If you want to run the full datasets instead of a stratified sample:

promptcrab-benchmark \
  --backend codex_cli \
  --model gpt-5.4 \
  --codex-reasoning-effort medium \
  --judge gemini_cli:gemini-3-flash-preview \
  --judge opencode_cli:minimax-coding-plan/MiniMax-M2.7-highspeed \
  --dataset hard_cases \
  --dataset mt_bench \
  --dataset ifeval \
  --cases-per-dataset 0 \
  --tokenizer o200k_base

The built-in hard_cases suite is always evaluated in full when selected; --cases-per-dataset only limits sampled external datasets.

Recommended starting points:

For highest fidelity and stability, use codex_cli --model gpt-5.4, optionally pin --codex-reasoning-effort medium|high|xhigh, and pick a different judge backend such as gemini_cli or opencode_cli.
For strongest prompt compression, compare opencode_cli --model minimax-coding-plan/MiniMax-M2.7-highspeed with codex_cli --model gpt-5.4 as judge.
Use gemini_cli --model gemini-3-flash-preview as a rewrite backend only if you want to compare it explicitly; current literal-fidelity performance is weaker than gpt-5.4 in the directional snapshot above.

If you omit --judge-backend, promptcrab skips judge-based verification and only applies literal checks. This is faster, but less safe.

Example: safer default rewrite

promptcrab \
  --backend codex_cli \
  --model gpt-5.4 \
  --codex-reasoning-effort medium \
  --judge-backend gemini_cli \
  --judge-model gemini-3-flash-preview \
  --prompt-file ./prompt.txt

Example: stronger compression with an external judge

promptcrab \
  --backend opencode_cli \
  --model minimax-coding-plan/MiniMax-M2.7-highspeed \
  --judge-backend codex_cli \
  --judge-model gpt-5.4 \
  --judge-codex-reasoning-effort medium \
  --prompt-file ./prompt.txt

For codex_cli, promptcrab can override reasoning effort with --codex-reasoning-effort and --judge-codex-reasoning-effort. If you omit those flags, Codex falls back to your local CLI configuration such as ~/.codex/config.toml.

Output Modes

Default output: prints the selected best prompt
--show-all: prints all candidates, checks, and verifier results
--json-output: prints a JSON object for automation
--write-best-to: saves the selected prompt to a file

Notes

If no candidate passes the fidelity gates, promptcrab returns the original prompt unchanged.
If you set --judge-backend, promptcrab runs an extra verification pass before accepting a candidate.
If you omit --judge-backend, promptcrab skips semantic verification and only uses literal checks.
If you want a truly independent judge, set --judge-backend to a different backend than --backend.
promptcrab does not set a generation output cap by default; if you need one for a specific backend or model, pass --max-output-tokens.
--max-output-tokens is currently forwarded to minimax and gemini; codex_cli and gemini_cli do not expose a matching flag in this wrapper yet.
Token counting depends on backend support and available credentials.
The selected best candidate is language-agnostic; whichever valid rewrite is smallest wins.

Changelog

See CHANGELOG.md.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
assets		assets
scripts		scripts
src/promptcrab		src/promptcrab
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
README.zh-TW.md		README.zh-TW.md
prompt_rewrite_pipeline.py		prompt_rewrite_pipeline.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

promptcrab

What It Does

Supported Backends

Installation

Configuration

Quick Start

Common Usage

Current Model Guidance

Directional Snapshot

Output Modes

Notes

Changelog

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

promptcrab

What It Does

Supported Backends

Installation

Configuration

Quick Start

Common Usage

Current Model Guidance

Directional Snapshot

Output Modes

Notes

Changelog

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages