Skip to content

Just-Agent/keep-codex-cheap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Keep Codex Cheap

A cache-first Codex skill for reducing repeated prompt cost without changing how Codex thinks or works.

The idea is simple:

Keep the prefix stable. Keep the session sticky. Let official prompt caching do the discounting.

This repository is inspired by:

  • keep-codex-fast: safe local Codex state maintenance
  • XAI Router's Codex cache guidance: session-aware routing, stable prefixes, and low-rewrite Responses forwarding

It does not promise magic savings. It helps you inspect whether your Codex setup is likely to preserve prompt-cache hits across long coding sessions.

How It Becomes Cheap

Codex is expensive in long sessions because every turn can resend a large repeated prefix:

  • system and developer instructions
  • tool definitions and JSON schemas
  • repo rules such as AGENTS.md
  • stable project context
  • previous session and conversation identifiers

Prompt caching can make that repeated prefix cheaper, but only when the provider sees the same beginning of the request again. The cache is strict: similar text is not enough; the prefix has to stay stable enough to match.

keep-codex-cheap helps with the parts a user can control:

  • Stable provider: do not bounce the same task between providers or upstream keys.
  • Stable transport: prefer one Codex path for the task, especially Responses API.
  • Stable session: WebSocket mode and session-aware routing make it easier for later turns to land near existing cache.
  • Stable model settings: model and reasoning effort changes can create different request buckets.
  • Stable static context: keep repeated rules and tool context at the front; put changing task details later when possible.

The savings come from the provider charging or processing cached input more cheaply than uncached input. This repository does not hide context from Codex, truncate important instructions, or rewrite the model's task. It makes the official cache path easier to hit.

The rough mental model is:

same long prefix + same session route + Responses-compatible transport
  -> higher prompt-cache hit probability
  -> less repeated prefill work
  -> lower repeated-input cost and latency

What It Checks

The bundled Rust CLI is read-only by default. It inspects a Codex config.toml and reports:

  • whether the configured provider points at https://api.xairouter.com
  • whether wire_api = "responses" is set
  • whether WebSocket mode is enabled when you expect long sessions
  • whether env_key is configured and present in the current shell
  • whether model and reasoning settings are stable enough for repeat sessions
  • whether the config looks likely to drift between providers or transport modes

It also prints copy-ready HTTP and WebSocket configuration templates.

Quick Start

Ask Codex:

Use $keep-codex-cheap to inspect my Codex config and tell me whether it is prompt-cache friendly.

Or run the CLI directly:

cargo run --quiet

Print the recommended WebSocket template:

cargo run --quiet -- --print-ws-config

Print the simpler HTTP template:

cargo run --quiet -- --print-http-config

Inspect a custom config path:

cargo run --quiet -- --config /path/to/config.toml

Recommended WebSocket Config

Use this when you want stronger long-session continuity:

model_provider = "xai"
model = "gpt-5.4"
model_reasoning_effort = "xhigh"
plan_mode_reasoning_effort = "xhigh"
model_reasoning_summary = "none"
model_verbosity = "medium"
approval_policy = "never"
sandbox_mode = "danger-full-access"
suppress_unstable_features_warning = true

[model_providers.xai]
name = "OpenAI"
base_url = "https://api.xairouter.com"
wire_api = "responses"
requires_openai_auth = false
env_key = "XAI_API_KEY"
supports_websockets = true

[features]
responses_websockets_v2 = true

Recommended HTTP Config

Use this when you prefer a simpler, broadly compatible setup:

model_provider = "xai"
model = "gpt-5.4"
model_reasoning_effort = "xhigh"
plan_mode_reasoning_effort = "xhigh"
model_reasoning_summary = "none"
model_verbosity = "medium"
approval_policy = "never"
sandbox_mode = "danger-full-access"

[model_providers.xai]
name = "OpenAI"
base_url = "https://api.xairouter.com"
wire_api = "responses"
requires_openai_auth = false
env_key = "XAI_API_KEY"

Cheapness Checklist

  • Keep static instructions, tool schemas, and repo rules stable.
  • Avoid switching providers, models, or transport modes mid-task.
  • Prefer Responses API for Codex-style workflows.
  • Use WebSocket mode for long interactive sessions when available.
  • Keep session and conversation continuity intact.
  • Put dynamic task details after stable context when you control prompt layout.
  • Do not chase artificial cache metrics by rewriting request semantics.

What It Does Not Do

  • It does not make every token cheap.
  • It does not cache model outputs or replay old answers.
  • It does not share cache across organizations.
  • It does not mutate ~/.codex/config.toml unless a future command explicitly implements that and you ask for it.
  • It does not print API keys.

Install

Ask Codex:

Install the keep-codex-cheap skill from https://github.com/3873225350/keep-codex-cheap

Or clone/copy this folder into your Codex skills directory as keep-codex-cheap.

Build

cargo build --release

The binary will be available at:

target/release/keep-codex-cheap

Run validation:

cargo test

Privacy And Safety

Report mode does not write files. It prints only configuration health and hides environment variable values. It never prints API keys.

If you share reports publicly, review local paths and provider names first.

About

Rust CLI and Codex skill for prompt-cache friendly Codex configuration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages