Skip to content

feat(fact-check): Phase 1 — AST-based post-polish fact-check#28

Merged
silversurfer562 merged 3 commits into
mainfrom
feat/polish-fact-check-phase1
May 15, 2026
Merged

feat(fact-check): Phase 1 — AST-based post-polish fact-check#28
silversurfer562 merged 3 commits into
mainfrom
feat/polish-fact-check-phase1

Conversation

@silversurfer562
Copy link
Copy Markdown
Member

Summary

Phase 1 of the polish-fact-check spec (umbrella PR #27). Adds an AST-based post-polish verification layer that catches LLM-fabricated technical detail without calling an LLM.

Four checks, zero LLM cost, ~600 LOC including tests + regression fixture:

Check What it catches Fixture coverage
check_python_refs Hallucinated imports + dotted paths (attune.ops._readers) 2 of 6
check_cli_refs Invented CLI flags (--allow-run) 1 of 6
check_md_links Missing relative link targets 1 of 6
check_numeric_refs Wrong counts (498 templates) 1 of 6
Phase 1 total 5 of 6 (the 6th — missing security callout — is Phase 3)

Behavior

Wired into generator.apply_polish_results so every polished template runs through the check immediately after being written.

  • Soft-fail (default) — findings are appended as an ## Unresolved references table at the bottom of the polished file.
  • StrictFactCheckError raised after the write; the bad file is left on disk for inspection.
  • Off — short-circuits before reading the file.

Controlled via ATTUNE_AUTHOR_FACT_CHECK env var (off | soft | strict, default soft) and [tool.attune-author.fact-check] in pyproject.toml:

[tool.attune-author.fact-check]
enabled = true
soft_fail = true
check_python_refs = true
check_cli_refs = true
check_md_links = true
check_numeric_refs = true

[tool.attune-author.fact-check.skip]
"docs/architecture/some-feature.md" = ["check_md_links"]

Version coupling

check_cli_refs resolves against whichever attune-ai is installed in the active venv. Every finding includes proactive context so an operator running against a different attune-ai version can resolve false positives without spelunking:

Line 17: `attune ops --allow-run` — flag not found in `attune ops --help`

Detected against attune 6.8.0 (installed in active venv). If you
are regenerating against a different version, verify the flag
exists in that version's `attune --help`.
To override:
  - One-off:  attune-author generate FEATURE --skip-check check_cli_refs
  - Per file: [tool.attune-author.fact-check.skip]
              "docs/how-to/foo.md" = ["check_cli_refs"]

Regression fixture

tests/fixtures/fact_check_ops_dashboard/ ships pre-fix and post-fix versions of the four attune-ai PR #351 docs. The fixture-based test suite asserts each check fires on the pre-fix files and is silent on the post-fix files, exercising the spec's "5/6 ops-dashboard errors caught" exit gate.

Test plan

  • 55 new tests in tests/unit/fact_check/ — all green locally on Py 3.10
  • ruff check clean on new code
  • No regressions in unrelated test suites (10 pre-existing failures verified unchanged with my changes stashed)
  • CI matrix (Ubuntu / macOS / Windows × Py 3.10–3.13)
  • Live regen of a feature to validate the soft-fail block format in real output

Spec tasks

Completed in this PR: 1.1–1.8, 1.10, 1.11, 1.11.1, 1.12, 1.13, 1.14, 1.15, 1.16.

Deferred: 1.9 (CLI flags --fact-check=strict / --no-fact-check) — the env var ATTUNE_AUTHOR_FACT_CHECK ships in this PR; the named CLI flags can land as a small follow-up that wraps the existing env hook.

Phase 2 (ground-truth context injection), Phase 3 (faithfulness judge), and Phase 4 (tutorial static check) remain to ship as separate PRs per the spec.

🤖 Generated with Claude Code

silversurfer562 and others added 3 commits May 15, 2026 07:23
Phase 1 of the polish-fact-check umbrella spec
(docs/specs/polish-fact-check/), shipped as its own PR per the
"four phases, four PRs" plan in the spec.

Adds `src/attune_author/fact_check/`, a stdlib-only post-polish
verification layer that runs against every polished template
emitted by `apply_polish_results`. Four checks, zero LLM cost:

- `check_python_refs` — parses Python code fences with `ast`,
  resolves each import + prose dotted path via
  `importlib.import_module` in the active venv. Catches the
  `attune.ops._readers` class of hallucination (the path
  parses fine but doesn't exist) — the most damaging failure
  mode in the attune-ai #351 regression fixture.
- `check_cli_refs` — parses references of the form
  `attune <subcommand> --flag` and verifies the flag appears
  in the cached `--help` output for that subcommand chain.
  Every finding carries a version-coupling messaging block
  (installed attune-ai version + per-file override snippet)
  so the operator can resolve false positives across version
  drift without spelunking.
- `check_md_links` — verifies relative `[label](target.md)`
  link targets exist. External URLs and pure anchors are
  skipped.
- `check_numeric_refs` — verifies counts like `N templates`,
  `N features`, `N kinds` against the project filesystem and
  manifest. Unverifiable nouns (workflows, skills, agents)
  surface as warnings asking for human review.

Wired into the polish pipeline at
`generator.apply_polish_results`. Default mode is soft-fail:
findings append an `## Unresolved references` table to the
polished file. Strict mode raises `FactCheckError`. Control
via `ATTUNE_AUTHOR_FACT_CHECK` env var
(`off | soft | strict`, default `soft`) and the
`[tool.attune-author.fact-check]` table in `pyproject.toml`
(per-check toggles + per-file skip list).

Regression fixture: `tests/fixtures/fact_check_ops_dashboard/`
ships pre-fix and post-fix versions of the four
attune-ai #351 docs. The fixture-based test suite asserts
each check fires on the pre-fix files and is silent on the
post-fix files, exercising the spec's "5/6 ops-dashboard
errors caught" exit gate.

Coverage: 55 new tests (`tests/unit/fact_check/`). One
integration test verifies multi-check aggregation; per-check
tests cover the happy path, the regression-fixture cases,
de-duplication, and the version-coupling block.

Spec tasks completed: 1.1–1.8, 1.10, 1.11, 1.11.1, 1.12,
1.13, 1.14, 1.15, 1.16. Deferred: 1.9 (CLI flags — env var
ships in this PR; the named flags can land as a small
follow-up).

Phase 2 (ground-truth context injection), Phase 3
(faithfulness judge integration), and Phase 4 (tutorial
static check) remain in spec; each ships as its own PR.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The cli_refs check at src/attune_author/fact_check/cli_refs.py
early-returns ``[]`` when ``shutil.which(cli)`` returns None.
In CI, ``attune`` isn't installed (attune-ai is not in
attune-author's dev deps), so the check silently produces no
findings — and the four CLI-ref tests fail with "expected
--turbo to surface" / "assert []".

Locally the tests passed because the dev venv resolves
``attune`` via the uv workspace setup. CI is a clean checkout
without that — hence the divergence across 8 platforms (ubuntu
× 3.10–3.13 and windows × 3.10–3.13 in PR #28's matrix).

Fix is one-line per affected test: monkey-patch
``cli_refs.shutil.which`` to return a non-None path so the
guard passes and the rest of the test's monkey-patches (over
_resolve_cli_name, _help_text, _installed_version) actually
take effect.

Verified:
- 65/65 fact_check tests pass locally (with attune on PATH).
- Same 11 fixture-based tests pass with PATH stripped of
  attune (the CI scenario): ``PATH=/usr/bin:/bin pytest …``.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes task 1.9 in docs/specs/polish-fact-check/tasks.md.

The fact-check pass is controlled by ATTUNE_AUTHOR_FACT_CHECK
(``off | soft | strict``, default ``soft``). Phase 1 of the
spec landed this env-var path in e11feb5. This commit adds a
matching CLI surface on the two commands that invoke the
polish pipeline:

  attune-author generate <feat> --fact-check strict
  attune-author regenerate --no-fact-check

Argparse adds the flags as a mutually exclusive group on both
``generate`` and ``regenerate`` subparsers. ``--no-fact-check``
is shorthand for ``--fact-check off``. ``_apply_fact_check_args``
translates either flag into the env var before the dispatch
function imports the generator.

Precedence (matches existing --rag pattern):

  1. ATTUNE_AUTHOR_FACT_CHECK env var if set — shell-level
     intent wins over per-invocation flags so the operator can
     enforce a policy across an entire session.
  2. ``--fact-check`` / ``--no-fact-check`` CLI flags —
     per-invocation override of the project default.
  3. ``[tool.attune-author.fact-check]`` in pyproject.toml —
     project-level defaults loaded by load_config().

Tests added at tests/unit/fact_check/test_cli_flags.py (10
cases): each precedence rule, mutual-exclusivity enforcement,
argparse choice validation, and the four-pass argparse shape
across generate + regenerate. All 65 fact_check tests still
pass.

CHANGELOG/README updated to describe the three-layer control
surface (the existing entries only documented the env var).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@silversurfer562 silversurfer562 merged commit 3de9323 into main May 15, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant