feat(fact-check): Phase 1 — AST-based post-polish fact-check by silversurfer562 · Pull Request #28 · Smart-AI-Memory/attune-author

silversurfer562 · 2026-05-15T11:23:54Z

Summary

Phase 1 of the polish-fact-check spec (umbrella PR #27). Adds an AST-based post-polish verification layer that catches LLM-fabricated technical detail without calling an LLM.

Four checks, zero LLM cost, ~600 LOC including tests + regression fixture:

Check	What it catches	Fixture coverage
`check_python_refs`	Hallucinated imports + dotted paths (`attune.ops._readers`)	2 of 6
`check_cli_refs`	Invented CLI flags (`--allow-run`)	1 of 6
`check_md_links`	Missing relative link targets	1 of 6
`check_numeric_refs`	Wrong counts (`498 templates`)	1 of 6
Phase 1 total		5 of 6 (the 6th — missing security callout — is Phase 3)

Behavior

Wired into generator.apply_polish_results so every polished template runs through the check immediately after being written.

Soft-fail (default) — findings are appended as an ## Unresolved references table at the bottom of the polished file.
Strict — FactCheckError raised after the write; the bad file is left on disk for inspection.
Off — short-circuits before reading the file.

Controlled via ATTUNE_AUTHOR_FACT_CHECK env var (off | soft | strict, default soft) and [tool.attune-author.fact-check] in pyproject.toml:

[tool.attune-author.fact-check]
enabled = true
soft_fail = true
check_python_refs = true
check_cli_refs = true
check_md_links = true
check_numeric_refs = true

[tool.attune-author.fact-check.skip]
"docs/architecture/some-feature.md" = ["check_md_links"]

Version coupling

check_cli_refs resolves against whichever attune-ai is installed in the active venv. Every finding includes proactive context so an operator running against a different attune-ai version can resolve false positives without spelunking:

Line 17: `attune ops --allow-run` — flag not found in `attune ops --help`

Detected against attune 6.8.0 (installed in active venv). If you
are regenerating against a different version, verify the flag
exists in that version's `attune --help`.
To override:
  - One-off:  attune-author generate FEATURE --skip-check check_cli_refs
  - Per file: [tool.attune-author.fact-check.skip]
              "docs/how-to/foo.md" = ["check_cli_refs"]

Regression fixture

tests/fixtures/fact_check_ops_dashboard/ ships pre-fix and post-fix versions of the four attune-ai PR #351 docs. The fixture-based test suite asserts each check fires on the pre-fix files and is silent on the post-fix files, exercising the spec's "5/6 ops-dashboard errors caught" exit gate.

Test plan

55 new tests in tests/unit/fact_check/ — all green locally on Py 3.10
ruff check clean on new code
No regressions in unrelated test suites (10 pre-existing failures verified unchanged with my changes stashed)
CI matrix (Ubuntu / macOS / Windows × Py 3.10–3.13)
Live regen of a feature to validate the soft-fail block format in real output

Spec tasks

Completed in this PR: 1.1–1.8, 1.10, 1.11, 1.11.1, 1.12, 1.13, 1.14, 1.15, 1.16.

Deferred: 1.9 (CLI flags --fact-check=strict / --no-fact-check) — the env var ATTUNE_AUTHOR_FACT_CHECK ships in this PR; the named CLI flags can land as a small follow-up that wraps the existing env hook.

Phase 2 (ground-truth context injection), Phase 3 (faithfulness judge), and Phase 4 (tutorial static check) remain to ship as separate PRs per the spec.

🤖 Generated with Claude Code

Phase 1 of the polish-fact-check umbrella spec (docs/specs/polish-fact-check/), shipped as its own PR per the "four phases, four PRs" plan in the spec. Adds `src/attune_author/fact_check/`, a stdlib-only post-polish verification layer that runs against every polished template emitted by `apply_polish_results`. Four checks, zero LLM cost: - `check_python_refs` — parses Python code fences with `ast`, resolves each import + prose dotted path via `importlib.import_module` in the active venv. Catches the `attune.ops._readers` class of hallucination (the path parses fine but doesn't exist) — the most damaging failure mode in the attune-ai #351 regression fixture. - `check_cli_refs` — parses references of the form `attune <subcommand> --flag` and verifies the flag appears in the cached `--help` output for that subcommand chain. Every finding carries a version-coupling messaging block (installed attune-ai version + per-file override snippet) so the operator can resolve false positives across version drift without spelunking. - `check_md_links` — verifies relative `[label](target.md)` link targets exist. External URLs and pure anchors are skipped. - `check_numeric_refs` — verifies counts like `N templates`, `N features`, `N kinds` against the project filesystem and manifest. Unverifiable nouns (workflows, skills, agents) surface as warnings asking for human review. Wired into the polish pipeline at `generator.apply_polish_results`. Default mode is soft-fail: findings append an `## Unresolved references` table to the polished file. Strict mode raises `FactCheckError`. Control via `ATTUNE_AUTHOR_FACT_CHECK` env var (`off | soft | strict`, default `soft`) and the `[tool.attune-author.fact-check]` table in `pyproject.toml` (per-check toggles + per-file skip list). Regression fixture: `tests/fixtures/fact_check_ops_dashboard/` ships pre-fix and post-fix versions of the four attune-ai #351 docs. The fixture-based test suite asserts each check fires on the pre-fix files and is silent on the post-fix files, exercising the spec's "5/6 ops-dashboard errors caught" exit gate. Coverage: 55 new tests (`tests/unit/fact_check/`). One integration test verifies multi-check aggregation; per-check tests cover the happy path, the regression-fixture cases, de-duplication, and the version-coupling block. Spec tasks completed: 1.1–1.8, 1.10, 1.11, 1.11.1, 1.12, 1.13, 1.14, 1.15, 1.16. Deferred: 1.9 (CLI flags — env var ships in this PR; the named flags can land as a small follow-up). Phase 2 (ground-truth context injection), Phase 3 (faithfulness judge integration), and Phase 4 (tutorial static check) remain in spec; each ships as its own PR. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The cli_refs check at src/attune_author/fact_check/cli_refs.py early-returns ``[]`` when ``shutil.which(cli)`` returns None. In CI, ``attune`` isn't installed (attune-ai is not in attune-author's dev deps), so the check silently produces no findings — and the four CLI-ref tests fail with "expected --turbo to surface" / "assert []". Locally the tests passed because the dev venv resolves ``attune`` via the uv workspace setup. CI is a clean checkout without that — hence the divergence across 8 platforms (ubuntu × 3.10–3.13 and windows × 3.10–3.13 in PR #28's matrix). Fix is one-line per affected test: monkey-patch ``cli_refs.shutil.which`` to return a non-None path so the guard passes and the rest of the test's monkey-patches (over _resolve_cli_name, _help_text, _installed_version) actually take effect. Verified: - 65/65 fact_check tests pass locally (with attune on PATH). - Same 11 fixture-based tests pass with PATH stripped of attune (the CI scenario): ``PATH=/usr/bin:/bin pytest …``. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Closes task 1.9 in docs/specs/polish-fact-check/tasks.md. The fact-check pass is controlled by ATTUNE_AUTHOR_FACT_CHECK (``off | soft | strict``, default ``soft``). Phase 1 of the spec landed this env-var path in e11feb5. This commit adds a matching CLI surface on the two commands that invoke the polish pipeline: attune-author generate <feat> --fact-check strict attune-author regenerate --no-fact-check Argparse adds the flags as a mutually exclusive group on both ``generate`` and ``regenerate`` subparsers. ``--no-fact-check`` is shorthand for ``--fact-check off``. ``_apply_fact_check_args`` translates either flag into the env var before the dispatch function imports the generator. Precedence (matches existing --rag pattern): 1. ATTUNE_AUTHOR_FACT_CHECK env var if set — shell-level intent wins over per-invocation flags so the operator can enforce a policy across an entire session. 2. ``--fact-check`` / ``--no-fact-check`` CLI flags — per-invocation override of the project default. 3. ``[tool.attune-author.fact-check]`` in pyproject.toml — project-level defaults loaded by load_config(). Tests added at tests/unit/fact_check/test_cli_flags.py (10 cases): each precedence rule, mutual-exclusivity enforcement, argparse choice validation, and the four-pass argparse shape across generate + regenerate. All 65 fact_check tests still pass. CHANGELOG/README updated to describe the three-layer control surface (the existing entries only documented the env var). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

silversurfer562 and others added 3 commits May 15, 2026 07:23

silversurfer562 merged commit 3de9323 into main May 15, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(fact-check): Phase 1 — AST-based post-polish fact-check#28

feat(fact-check): Phase 1 — AST-based post-polish fact-check#28
silversurfer562 merged 3 commits into
mainfrom
feat/polish-fact-check-phase1

silversurfer562 commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

silversurfer562 commented May 15, 2026

Summary

Behavior

Version coupling

Regression fixture

Test plan

Spec tasks

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant