A cache-first Codex skill for reducing repeated prompt cost without changing how Codex thinks or works.
The idea is simple:
Keep the prefix stable. Keep the session sticky. Let official prompt caching do the discounting.
This repository is inspired by:
keep-codex-fast: safe local Codex state maintenance- XAI Router's Codex cache guidance: session-aware routing, stable prefixes, and low-rewrite Responses forwarding
It does not promise magic savings. It helps you inspect whether your Codex setup is likely to preserve prompt-cache hits across long coding sessions.
Codex is expensive in long sessions because every turn can resend a large repeated prefix:
- system and developer instructions
- tool definitions and JSON schemas
- repo rules such as
AGENTS.md - stable project context
- previous session and conversation identifiers
Prompt caching can make that repeated prefix cheaper, but only when the provider sees the same beginning of the request again. The cache is strict: similar text is not enough; the prefix has to stay stable enough to match.
keep-codex-cheap helps with the parts a user can control:
- Stable provider: do not bounce the same task between providers or upstream keys.
- Stable transport: prefer one Codex path for the task, especially Responses API.
- Stable session: WebSocket mode and session-aware routing make it easier for later turns to land near existing cache.
- Stable model settings: model and reasoning effort changes can create different request buckets.
- Stable static context: keep repeated rules and tool context at the front; put changing task details later when possible.
The savings come from the provider charging or processing cached input more cheaply than uncached input. This repository does not hide context from Codex, truncate important instructions, or rewrite the model's task. It makes the official cache path easier to hit.
The rough mental model is:
same long prefix + same session route + Responses-compatible transport
-> higher prompt-cache hit probability
-> less repeated prefill work
-> lower repeated-input cost and latency
The bundled Rust CLI is read-only by default. It inspects a Codex config.toml and reports:
- whether the configured provider points at
https://api.xairouter.com - whether
wire_api = "responses"is set - whether WebSocket mode is enabled when you expect long sessions
- whether
env_keyis configured and present in the current shell - whether model and reasoning settings are stable enough for repeat sessions
- whether the config looks likely to drift between providers or transport modes
It also prints copy-ready HTTP and WebSocket configuration templates.
Ask Codex:
Use $keep-codex-cheap to inspect my Codex config and tell me whether it is prompt-cache friendly.
Or run the CLI directly:
cargo run --quietPrint the recommended WebSocket template:
cargo run --quiet -- --print-ws-configPrint the simpler HTTP template:
cargo run --quiet -- --print-http-configInspect a custom config path:
cargo run --quiet -- --config /path/to/config.tomlUse this when you want stronger long-session continuity:
model_provider = "xai"
model = "gpt-5.4"
model_reasoning_effort = "xhigh"
plan_mode_reasoning_effort = "xhigh"
model_reasoning_summary = "none"
model_verbosity = "medium"
approval_policy = "never"
sandbox_mode = "danger-full-access"
suppress_unstable_features_warning = true
[model_providers.xai]
name = "OpenAI"
base_url = "https://api.xairouter.com"
wire_api = "responses"
requires_openai_auth = false
env_key = "XAI_API_KEY"
supports_websockets = true
[features]
responses_websockets_v2 = trueUse this when you prefer a simpler, broadly compatible setup:
model_provider = "xai"
model = "gpt-5.4"
model_reasoning_effort = "xhigh"
plan_mode_reasoning_effort = "xhigh"
model_reasoning_summary = "none"
model_verbosity = "medium"
approval_policy = "never"
sandbox_mode = "danger-full-access"
[model_providers.xai]
name = "OpenAI"
base_url = "https://api.xairouter.com"
wire_api = "responses"
requires_openai_auth = false
env_key = "XAI_API_KEY"- Keep static instructions, tool schemas, and repo rules stable.
- Avoid switching providers, models, or transport modes mid-task.
- Prefer Responses API for Codex-style workflows.
- Use WebSocket mode for long interactive sessions when available.
- Keep session and conversation continuity intact.
- Put dynamic task details after stable context when you control prompt layout.
- Do not chase artificial cache metrics by rewriting request semantics.
- It does not make every token cheap.
- It does not cache model outputs or replay old answers.
- It does not share cache across organizations.
- It does not mutate
~/.codex/config.tomlunless a future command explicitly implements that and you ask for it. - It does not print API keys.
Ask Codex:
Install the keep-codex-cheap skill from https://github.com/3873225350/keep-codex-cheap
Or clone/copy this folder into your Codex skills directory as keep-codex-cheap.
cargo build --releaseThe binary will be available at:
target/release/keep-codex-cheapRun validation:
cargo testReport mode does not write files. It prints only configuration health and hides environment variable values. It never prints API keys.
If you share reports publicly, review local paths and provider names first.