From 78971b8b0afcc04ee86142389a03dd65d041dda2 Mon Sep 17 00:00:00 2001 From: 0xDevNinja Date: Tue, 19 May 2026 11:31:39 +0530 Subject: [PATCH] docs(codex): add Hermes-aware recovery branch for refresh_token_reused - Codex `refresh_token_reused` is split-brain by default: Hermes `openai-codex` can still serve valid tokens while `~/.codex/auth.json` goes stale, so running `codex login` first wipes a working session - Add a sub-bullet under Error Handling > Auth that branches on a Hermes smoke test, repairs `~/.codex/auth.json` from Hermes tokens when the provider still works, falls back to `codex login` only when both routes are dead - Doc-only; no Codex or Hermes auth behavior changes Fixes #1542. --- codex/SKILL.md | 7 +++++++ codex/SKILL.md.tmpl | 7 +++++++ 2 files changed, 14 insertions(+) diff --git a/codex/SKILL.md b/codex/SKILL.md index edf4075f2d..598c530b9f 100644 --- a/codex/SKILL.md +++ b/codex/SKILL.md @@ -1497,6 +1497,13 @@ If token count is not available, display: `Tokens: unknown` - **Binary not found:** Detected in Step 0. Stop with install instructions. - **Auth error:** Codex prints an auth error to stderr. Surface the error: "Codex authentication failed. Run `codex login` in your terminal to authenticate via ChatGPT." +- **Auth error — `refresh_token_reused` specifically (Hermes-aware recovery):** + When stderr contains `refresh_token_reused`, do **not** go straight to `codex login`. Hermes' `openai-codex` provider can still be serving valid tokens while the standalone `~/.codex/auth.json` credential store has gone stale (split-brain auth). Sending the user to `codex login` in that state wipes a working session that downstream skills relied on. + Diagnose, then repair, in this order: + 1. **Smoke-test the Hermes provider first** (if Hermes is installed). Run a trivial call through Hermes' `openai-codex` provider — e.g. `hermes ask --provider openai-codex 'ping'` or the equivalent profile-routed invocation. If it succeeds, the user's OpenAI session is still good; the failure is local to standalone `codex exec`'s credential store. + 2. **If Hermes works:** repair `~/.codex/auth.json` from the Hermes-side tokens instead of forcing a re-login. The exact helper varies by Hermes version (look for a `codex-auth-sync` / `hermes export codex-auth` style script in the local Hermes install). After copying the refreshed tokens back into `~/.codex/auth.json`, `chmod 600 ~/.codex/auth.json` and rerun the original `codex exec` to verify. + 3. **If Hermes is not installed or the Hermes provider also fails:** that confirms the OpenAI session itself is gone; `codex login` is the right next step. Run it, then re-invoke the skill. + Tell the user which branch you're on before suggesting any action — "Hermes provider still works, repairing `~/.codex/auth.json` from Hermes tokens" vs. "Hermes provider is also down, running `codex login`" — so they know whether they're about to lose a working Hermes session or not. - **Timeout (Bash outer gate):** If the Bash call times out (5 min for Review/Challenge, 10 min for Consult), tell the user: "Codex timed out. The prompt may be too large or the API may be slow. Try again or use a smaller scope." - **Timeout (inner `timeout` wrapper, exit 124):** If the shell `timeout 600` wrapper fires first, the skill's hang-detection block auto-logs a telemetry event + operational learning and prints: "Codex stalled past 10 minutes. Common causes: model API stall, long prompt, network issue. Try re-running. If persistent, split the prompt or check `~/.codex/logs/`." No extra action needed. diff --git a/codex/SKILL.md.tmpl b/codex/SKILL.md.tmpl index 329e93c4f5..5e33b36c76 100644 --- a/codex/SKILL.md.tmpl +++ b/codex/SKILL.md.tmpl @@ -618,6 +618,13 @@ If token count is not available, display: `Tokens: unknown` - **Binary not found:** Detected in Step 0. Stop with install instructions. - **Auth error:** Codex prints an auth error to stderr. Surface the error: "Codex authentication failed. Run `codex login` in your terminal to authenticate via ChatGPT." +- **Auth error — `refresh_token_reused` specifically (Hermes-aware recovery):** + When stderr contains `refresh_token_reused`, do **not** go straight to `codex login`. Hermes' `openai-codex` provider can still be serving valid tokens while the standalone `~/.codex/auth.json` credential store has gone stale (split-brain auth). Sending the user to `codex login` in that state wipes a working session that downstream skills relied on. + Diagnose, then repair, in this order: + 1. **Smoke-test the Hermes provider first** (if Hermes is installed). Run a trivial call through Hermes' `openai-codex` provider — e.g. `hermes ask --provider openai-codex 'ping'` or the equivalent profile-routed invocation. If it succeeds, the user's OpenAI session is still good; the failure is local to standalone `codex exec`'s credential store. + 2. **If Hermes works:** repair `~/.codex/auth.json` from the Hermes-side tokens instead of forcing a re-login. The exact helper varies by Hermes version (look for a `codex-auth-sync` / `hermes export codex-auth` style script in the local Hermes install). After copying the refreshed tokens back into `~/.codex/auth.json`, `chmod 600 ~/.codex/auth.json` and rerun the original `codex exec` to verify. + 3. **If Hermes is not installed or the Hermes provider also fails:** that confirms the OpenAI session itself is gone; `codex login` is the right next step. Run it, then re-invoke the skill. + Tell the user which branch you're on before suggesting any action — "Hermes provider still works, repairing `~/.codex/auth.json` from Hermes tokens" vs. "Hermes provider is also down, running `codex login`" — so they know whether they're about to lose a working Hermes session or not. - **Timeout (Bash outer gate):** If the Bash call times out (5 min for Review/Challenge, 10 min for Consult), tell the user: "Codex timed out. The prompt may be too large or the API may be slow. Try again or use a smaller scope." - **Timeout (inner `timeout` wrapper, exit 124):** If the shell `timeout 600` wrapper fires first, the skill's hang-detection block auto-logs a telemetry event + operational learning and prints: "Codex stalled past 10 minutes. Common causes: model API stall, long prompt, network issue. Try re-running. If persistent, split the prompt or check `~/.codex/logs/`." No extra action needed.