Skip to content

feat(tests): cross-harness portability lint#1603

Open
mvanhorn wants to merge 1 commit into
obra:mainfrom
mvanhorn:feat/cross-harness-portability-lint
Open

feat(tests): cross-harness portability lint#1603
mvanhorn wants to merge 1 commit into
obra:mainfrom
mvanhorn:feat/cross-harness-portability-lint

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Motivation

PR #1486 made cross-harness purity a maintained invariant of the skills library. This PR adds the static mechanism to keep it from regressing. The annotations to existing skill files make main start the lint clean so CI can be strict from day one.

What problem are you trying to solve?

PR #1486 (Cross-platform skill compatibility: agent-neutral prose, source-verified per-runtime tool refs, merged 2026-05-14) established cross-harness purity as a maintained invariant of the skills library. From your own PR body:

The motivating concrete failure was openai/plugins#217 - OpenAI's vendored fork of Superpowers attempted a wholesale Claude->Codex find-and-replace and got most of it wrong (rewrote historical attribution paths, replaced model names, broke install instructions). Their PR landed because nobody upstream had separated which references were actually Claude-Code-specific from which were generic prose that just happened to say "Claude".

PR #1486 cleaned up the existing state. The invariant is currently maintained by review attention alone. Nothing structural prevents the same regression class from re-entering on the next PR or downstream vendor fork. The session that motivated this work was reading PR #1486 and running grep against current main to find latent runtime-specific tokens the invariant doesn't actively prevent: skills/dispatching-parallel-agents/SKILL.md:69 contained // In Claude Code / AI environment in a generic TypeScript example block (not inside a per-runtime section), and that's the kind of regression a static lint catches in seconds.

What does this PR change?

Adds tests/lint-cross-harness.sh, a 164-line POSIX shell + awk static lint that scans skills/**/*.md for four classes of runtime-specific tokens. CI workflow at .github/workflows/lint.yml runs the lint on pull_request and push to main/dev and blocks merge on any violation. Operator documentation at docs/cross-harness-lint.md.

To make main pass the new lint from day one, the PR also adds 34 inline allowlist annotations across 15 existing skill files (using-superpowers, subagent-driven-development, writing-skills, executing-plans, requesting-code-review, brainstorming, writing-plans, dispatching-parallel-agents). The annotations are HTML comments of the form <!-- lint-cross-harness: allow "TOKEN" reason="..." -->. They do not change rendered Markdown content; they document existing legitimate per-runtime references so the lint can distinguish those from future regressions.

Is this change appropriate for the core library?

Yes. The lint maintains the cross-harness purity invariant the core library committed to in PR #1486. It is infrastructure tooling (tests + CI + docs), not a new skill, not a third-party integration, not a domain-specific workflow. The annotations to existing skill files do not modify behavior-shaping content (Red Flags tables, rationalization lists, "human partner" language). They are HTML metadata read by the lint and invisible in the rendered Markdown.

This is the static-analysis counterpart to the cross-agent scenario testing harness being built internally. Both layers catch different failure modes at different cost: static lint catches token-level violations in seconds at PR time without spinning up any harness; runtime testing catches behavioral divergence across agents after merge. They are complementary.

What alternatives did you consider?

  1. Runtime check inside lib/: rejected. Invasive, requires per-harness wiring, duplicates work the runtime scenario harness will already do.
  2. Python or Node-based linter: rejected. Adds a dependency to a Shell + Markdown project. POSIX shell + awk runs on any GitHub Actions worker with no setup.
  3. Off-the-shelf tools (markdown-link-check, vale, etc.): rejected. None understand cross-harness portability semantics; would need custom rules anyway.
  4. Make CI advisory (continue-on-error: true) on first introduction: considered. Rejected in favor of annotating the 34 existing references in this PR so CI can be strict from day one. Half-baked CI is worse than no CI.
  5. Add allowlist annotations as a follow-up PR after this one lands: rejected. The annotations make sense only in context of the lint existing. Splitting them creates a window where main is broken under the new strict workflow.

Does this PR contain multiple unrelated changes?

No. All changes serve one purpose: introduce the cross-harness portability lint with strict CI from day one. The lint script is the mechanism, the CI workflow is the enforcement, the docs explain it, and the 34 inline annotations are needed to make existing main pass the lint (without which strict CI would block the next PR you merge). The annotations are HTML metadata consumed by the lint - they do not alter rendered Markdown, do not modify behavior-shaping skill prose, and each carries a reason= field that documents why the surrounding paragraph legitimately mentions a runtime-specific token.

Existing PRs

I did not find a prior PR proposing a static cross-harness portability lint.

Environment tested

Harness (e.g. Claude Code, Cursor) Harness version Model Model version/ID
GitHub Actions (ubuntu-latest runner) n/a n/a n/a

The lint is a static sh + awk script with no agent runtime dependency. The Environment table reflects where the workflow executes; tools required at runtime are find, awk, and bash, all standard on ubuntu-latest. I verified the lint locally on macOS 15 with /bin/sh (POSIX) and confirmed sh -n is clean.

New harness support

N/A. This PR adds no new harness.

Evaluation

  • Initial prompt: reading PR Cross-platform skill compatibility: agent-neutral prose, source-verified per-runtime tool refs #1486 + Update references in Plugins openai/plugins#217, then grep -rn 'In Claude Code' skills/ to find what the invariant didn't structurally prevent. Found skills/dispatching-parallel-agents/SKILL.md:69 as the canary case.
  • Iterations on the lint script: two passes. First pass exited 1 with 51 violations on main. Review surfaced that the workflow would block CI immediately, so the lint was narrowed to recognize the existing per-runtime conventions in this codebase. Second pass added bold-prose inline section markers (**In <harness>:**) and extended the references/<runtime>-tools.md file allowance to bare-harness names. Result: 34 violations remaining, all genuine cross-harness leakage in skill prose.
  • Annotation pass: added 34 inline allowlist comments with reason= fields documenting each legitimate reference (frontmatter discovery text, runtime mapping prose, dispatch prompt headers, quoted Anthropic docs, graphviz workflow labels). After the annotation pass, the lint reports Lint complete: 0 violations. on this branch.

Rigor

  • If this is a skills change: this PR adds HTML allowlist comments to skill files but does not modify behavior-shaping content. The comments render invisibly in GitHub. No superpowers:writing-skills adversarial testing was performed because no behavior-shaping content was changed.
  • This change was tested adversarially: lint output progressively narrowed from 51 -> 34 -> 0 violations through iterative tightening of the false-positive surface (bold-prose inline markers, references/-tools.md allowance) and per-paragraph annotation of genuine cross-harness leakage.
  • I did not modify carefully-tuned content (Red Flags table, rationalization lists, "human partner" language).

Human review

  • A human has reviewed the COMPLETE proposed diff before submission.

Lint output on this branch

$ bash tests/lint-cross-harness.sh
Lint complete: 0 violations.
$ echo $?
0

Lint output on the canary case (before annotation)

$ bash tests/lint-cross-harness.sh
VIOLATION [bare-harness-name] skills/dispatching-parallel-agents/SKILL.md:69
    // In Claude Code / AI environment
... (33 more violations across 14 other skill files)
Lint complete: 34 violations in 15 files.
$ echo $?
1

AI was used for assistance.

Fixes the cross-harness portability invariant maintenance gap left by PR #1486.

@obra
Copy link
Copy Markdown
Owner

obra commented May 22, 2026

@mvanhorn - The linter comments aren't gonna work. I wonder if the right thing is to teach the linter script to maintain its own list of exceptions internally. They pollute everybody's context windows. The thing that we want to avoid is casual or unintentional places where we are assuming a single agent's worldview. Anytime we mention at least two agents, it's almost certainly okay.

Adds tests/lint-cross-harness.sh, a POSIX shell + awk static lint that scans
skills/**/*.md for four classes of runtime-specific tokens that would weaken
the cross-harness purity invariant established by PR obra#1486:

1. bare-harness-name: Claude Code, Cursor, OpenCode, Codex CLI/App, Gemini CLI,
   GitHub Copilot CLI, Copilot CLI, Factory Droid in generic prose
2. model-id: claude-(opus|sonnet|haiku)-N-M, gpt-N.M, gemini-N.M-tier, oN-mini
3. runtime-tool: ExitPlanMode, TodoWrite, WebFetch, Task tool, Skill tool, mcp__server__tool
4. hardcoded-path: /Users/<name>/ macOS personal paths

A line is allowed under any of these rules (first match wins):

- Two or more distinct harness families named on the same line (Claude, Codex,
  Cursor, OpenCode, Gemini, Copilot, Factory Droid, Aider, Cline, Windsurf,
  Hermes, Hyperagent, Antigravity, Kiro, Qwen, Kimi). Casual cross-runtime
  prose like '~/.claude/skills for Claude Code, ~/.agents/skills/ for Codex'
  passes without annotation.
- A section heading whose text matches In <harness> / For <harness> /
  <harness>:.
- An inline bold-prose marker at line start: **In <harness>:**, etc.
- A skills/*/references/<runtime>-tools.md file (bare-harness + runtime-tool
  only; model IDs and hardcoded paths always flag).
- The internal exception list at tests/lint-cross-harness.exceptions
  (path:line:reason). Used for intentional single-agent references that the
  two-agents rule does not naturally cover (graphviz workflow diagrams,
  subagent dispatch prompt headers, source-attributed Anthropic excerpts).

No skill content is modified. The two-agents rule and sidecar exception list
together let the lint maintain its own allowlist instead of polluting skill
files with inline annotations, which would add tokens to every loaded skill
context at runtime.

CI workflow .github/workflows/lint.yml runs on PR and push to main/dev, blocks
merge on any violation. docs/cross-harness-lint.md documents the rules,
allowlist mechanism, and sidecar format.
@mvanhorn mvanhorn force-pushed the feat/cross-harness-portability-lint branch from a09e9b4 to b038a2b Compare May 22, 2026 02:48
@mvanhorn
Copy link
Copy Markdown
Contributor Author

Done - pushed the rewrite. The lint now auto-allows any line that names 2+ harness families (Claude / Codex / Cursor / OpenCode / Gemini / Copilot / Aider / Cline / Windsurf / Hermes / Hyperagent / Antigravity / Kiro / Qwen / Kimi / Factory Droid), which picks up cross-runtime prose like ~/.claude/skills for Claude Code, ~/.agents/skills/ for Codex without annotation. For the remaining single-agent references (graphviz workflow nodes, subagent dispatch prompt headers, source-attributed Anthropic excerpts, the TodoWrite examples in persuasion-principles), there is a sidecar tests/lint-cross-harness.exceptions file with path:line:reason entries. Zero skill files modified.

@obra obra added enhancement New feature or request needs-rebase-to-dev-branch PR targets main but should target dev labels May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request needs-rebase-to-dev-branch PR targets main but should target dev

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants