From 78971b8b0afcc04ee86142389a03dd65d041dda2 Mon Sep 17 00:00:00 2001
From: 0xDevNinja <manmit0x@gmail.com>
Date: Tue, 19 May 2026 11:31:39 +0530
Subject: [PATCH] docs(codex): add Hermes-aware recovery branch for
 refresh_token_reused

- Codex `refresh_token_reused` is split-brain by default: Hermes
  `openai-codex` can still serve valid tokens while `~/.codex/auth.json`
  goes stale, so running `codex login` first wipes a working session
- Add a sub-bullet under Error Handling > Auth that branches on a
  Hermes smoke test, repairs `~/.codex/auth.json` from Hermes tokens
  when the provider still works, falls back to `codex login` only
  when both routes are dead
- Doc-only; no Codex or Hermes auth behavior changes

Fixes #1542.
---
 codex/SKILL.md      | 7 +++++++
 codex/SKILL.md.tmpl | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/codex/SKILL.md b/codex/SKILL.md
index edf4075f2d..598c530b9f 100644
--- a/codex/SKILL.md
+++ b/codex/SKILL.md
@@ -1497,6 +1497,13 @@ If token count is not available, display: `Tokens: unknown`
 - **Binary not found:** Detected in Step 0. Stop with install instructions.
 - **Auth error:** Codex prints an auth error to stderr. Surface the error:
   "Codex authentication failed. Run `codex login` in your terminal to authenticate via ChatGPT."
+- **Auth error — `refresh_token_reused` specifically (Hermes-aware recovery):**
+  When stderr contains `refresh_token_reused`, do **not** go straight to `codex login`. Hermes' `openai-codex` provider can still be serving valid tokens while the standalone `~/.codex/auth.json` credential store has gone stale (split-brain auth). Sending the user to `codex login` in that state wipes a working session that downstream skills relied on.
+  Diagnose, then repair, in this order:
+  1. **Smoke-test the Hermes provider first** (if Hermes is installed). Run a trivial call through Hermes' `openai-codex` provider — e.g. `hermes ask --provider openai-codex 'ping'` or the equivalent profile-routed invocation. If it succeeds, the user's OpenAI session is still good; the failure is local to standalone `codex exec`'s credential store.
+  2. **If Hermes works:** repair `~/.codex/auth.json` from the Hermes-side tokens instead of forcing a re-login. The exact helper varies by Hermes version (look for a `codex-auth-sync` / `hermes export codex-auth` style script in the local Hermes install). After copying the refreshed tokens back into `~/.codex/auth.json`, `chmod 600 ~/.codex/auth.json` and rerun the original `codex exec` to verify.
+  3. **If Hermes is not installed or the Hermes provider also fails:** that confirms the OpenAI session itself is gone; `codex login` is the right next step. Run it, then re-invoke the skill.
+  Tell the user which branch you're on before suggesting any action — "Hermes provider still works, repairing `~/.codex/auth.json` from Hermes tokens" vs. "Hermes provider is also down, running `codex login`" — so they know whether they're about to lose a working Hermes session or not.
 - **Timeout (Bash outer gate):** If the Bash call times out (5 min for Review/Challenge, 10 min for Consult), tell the user:
   "Codex timed out. The prompt may be too large or the API may be slow. Try again or use a smaller scope."
 - **Timeout (inner `timeout` wrapper, exit 124):** If the shell `timeout 600` wrapper fires first, the skill's hang-detection block auto-logs a telemetry event + operational learning and prints: "Codex stalled past 10 minutes. Common causes: model API stall, long prompt, network issue. Try re-running. If persistent, split the prompt or check `~/.codex/logs/`." No extra action needed.
diff --git a/codex/SKILL.md.tmpl b/codex/SKILL.md.tmpl
index 329e93c4f5..5e33b36c76 100644
--- a/codex/SKILL.md.tmpl
+++ b/codex/SKILL.md.tmpl
@@ -618,6 +618,13 @@ If token count is not available, display: `Tokens: unknown`
 - **Binary not found:** Detected in Step 0. Stop with install instructions.
 - **Auth error:** Codex prints an auth error to stderr. Surface the error:
   "Codex authentication failed. Run `codex login` in your terminal to authenticate via ChatGPT."
+- **Auth error — `refresh_token_reused` specifically (Hermes-aware recovery):**
+  When stderr contains `refresh_token_reused`, do **not** go straight to `codex login`. Hermes' `openai-codex` provider can still be serving valid tokens while the standalone `~/.codex/auth.json` credential store has gone stale (split-brain auth). Sending the user to `codex login` in that state wipes a working session that downstream skills relied on.
+  Diagnose, then repair, in this order:
+  1. **Smoke-test the Hermes provider first** (if Hermes is installed). Run a trivial call through Hermes' `openai-codex` provider — e.g. `hermes ask --provider openai-codex 'ping'` or the equivalent profile-routed invocation. If it succeeds, the user's OpenAI session is still good; the failure is local to standalone `codex exec`'s credential store.
+  2. **If Hermes works:** repair `~/.codex/auth.json` from the Hermes-side tokens instead of forcing a re-login. The exact helper varies by Hermes version (look for a `codex-auth-sync` / `hermes export codex-auth` style script in the local Hermes install). After copying the refreshed tokens back into `~/.codex/auth.json`, `chmod 600 ~/.codex/auth.json` and rerun the original `codex exec` to verify.
+  3. **If Hermes is not installed or the Hermes provider also fails:** that confirms the OpenAI session itself is gone; `codex login` is the right next step. Run it, then re-invoke the skill.
+  Tell the user which branch you're on before suggesting any action — "Hermes provider still works, repairing `~/.codex/auth.json` from Hermes tokens" vs. "Hermes provider is also down, running `codex login`" — so they know whether they're about to lose a working Hermes session or not.
 - **Timeout (Bash outer gate):** If the Bash call times out (5 min for Review/Challenge, 10 min for Consult), tell the user:
   "Codex timed out. The prompt may be too large or the API may be slow. Try again or use a smaller scope."
 - **Timeout (inner `timeout` wrapper, exit 124):** If the shell `timeout 600` wrapper fires first, the skill's hang-detection block auto-logs a telemetry event + operational learning and prints: "Codex stalled past 10 minutes. Common causes: model API stall, long prompt, network issue. Try re-running. If persistent, split the prompt or check `~/.codex/logs/`." No extra action needed.