spec(docs): polish-fact-check — umbrella spec to reduce LLM polish-pass hallucinations#27
Open
silversurfer562 wants to merge 1 commit into
Open
spec(docs): polish-fact-check — umbrella spec to reduce LLM polish-pass hallucinations#27silversurfer562 wants to merge 1 commit into
silversurfer562 wants to merge 1 commit into
Conversation
…ucinations Adds docs/specs/polish-fact-check/ — an umbrella spec for a four-phase intervention ladder that shifts polish-pass verification work from human editorial review to automated checks. Motivated by a regression fixture from attune-ai PR #351 (Smart-AI-Memory/attune-ai#351), where the ops-dashboard regen produced six factual errors in a single feature's docs: - 1 hallucinated CLI flag (--allow-run, real flag is --read-only) - 2 hallucinated private module paths (attune.ops._readers) - 4 hallucinated cross-references - 1 hallucinated count (498 templates vs real 259) - 2 wrong route paths - 1 insecure example (0.0.0.0 without auth callout) Three of the six (CLI flag, private imports, wrong routes) would actively break readers who follow the docs literally. Four phases, each shipping as its own PR: Phase 1: AST-based post-generation fact-check (Python refs, CLI flags, Markdown links, numeric claims) — catches 5 of 6 fixture errors. Cheapest, no LLM cost. Phase 2: Ground-truth context injection into polish prompt (CLI --help output, __all__, dataclass fields). Phase 3: Adapt attune-rag faithfulness judge as a polish post-step. Catches the 6th fixture error (missing-security-callout). Phase 4: Static analysis of tutorial code samples (mypy + ast.parse). Execution tiers explicitly deferred to Phase 4.2 for security reasons. Files: - requirements.md — problem statement, scope, acceptance - design.md — architecture, per-phase API shapes, open design questions - tasks.md — numbered tasks per phase, exit checklists - decisions.md — pre-committed decision matrix (introduces a spec-file convention) Status: draft. Awaiting review/approval before Phase 1 implementation begins. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Draft umbrella spec at
docs/specs/polish-fact-check/. Proposes a four-phase intervention ladder to shift polish-pass verification work from manual editorial review to automated checks.Motivation
Regression fixture from attune-ai#351: a single attune-author regen of one feature (
ops-dashboard, 11 templates + 4 published-site docs) produced six distinct factual errors that required a manual editorial pass to fix:--allow-run(real:--read-only, inverted semantics)from attune.ops._readers import …(`ModuleNotFoundError`)Three of the six (CLI flag, private imports, wrong routes) actively break readers who follow the docs literally. The current mitigation (manual editorial review) doesn't scale to the 9 stale features queued or to the weekly regen cadence the living-help system requires.
Four phases, each shipping as its own PR
Files
Notes for review
Status: `draft`. Awaiting review/approval before Phase 1 implementation begins.
🤖 Generated with Claude Code