Skip to content

fix: add behavioral depth criteria to task acceptance and parity checks#225

Merged
jafreck merged 2 commits intomainfrom
fix/behavioral-acceptance-criteria
Mar 27, 2026
Merged

fix: add behavioral depth criteria to task acceptance and parity checks#225
jafreck merged 2 commits intomainfrom
fix/behavioral-acceptance-criteria

Conversation

@jafreck
Copy link
Copy Markdown
Owner

@jafreck jafreck commented Mar 27, 2026

Problem

The task graph builder generated only structural acceptance criteria for migration tasks:

  • "All symbols correctly migrated"
  • "Call-site signatures match"
  • "Target code compiles without type errors"

Meanwhile, the parity verifier enforced behavioral equivalence — checking for hollow implementations, dead dispatch, pass-through codecs, and semantic effectiveness. This disconnect caused the code-migrator to produce implementations that satisfied the stated acceptance criteria (compiled, had the right symbols) but were algorithmically hollow — leading to expensive parity retry loops (7+ attempts per task in the zstd migration).

Root Cause

buildAcceptanceCriteria() and buildParityChecks() in task-graph-builder.ts only generated structural checks. The code-migrator had no upfront signal that it would be held to a behavioral standard, so it optimized for the criteria it was given.

Fix

Both functions now generate behavioral criteria for every task containing functions:

Acceptance Criteria (new)

  • Every function body fully implemented — no stubs, TODOs, or placeholders
  • Behavioral equivalence: same observable outputs using idiomatic target-language patterns (different types, signatures, and error models are expected)
  • Implementation depth matches source complexity — no pass-throughs or synthetic wrappers

Parity Checks (new)

  • All source code paths and branches reachable in the target
  • No hollow implementations: functions must produce non-trivial, input-dependent output
  • Internal call chains wired end-to-end — public entry points invoke the same algorithmic stages

These criteria:

  • Apply uniformly regardless of task size (a 30-line hash gets the same bar as a 900-line compressor)
  • Use behavioral language ("same observable outputs") not structural language, so idiomatic rewrites are not penalized
  • Bridge the gap so the migrator knows upfront what the verifier will enforce

Test Coverage

Added 4 new tests:

  • should include behavioral acceptance criteria for tasks with functions
  • should include behavioral parity checks for tasks with functions
  • should not include behavioral criteria for type-only tasks
  • should apply behavioral criteria uniformly regardless of task size

All 48 task-graph-builder tests pass. Full suite: 1498 passed.

jafreck added 2 commits March 27, 2026 10:00
The task graph builder was generating only structural acceptance criteria
(symbols exist, signatures match, compiles) while the parity verifier
enforced full behavioral equivalence. This gap caused the code-migrator
to produce hollow implementations that compiled but didn't actually
perform the intended computation — leading to expensive retry loops.

Both buildAcceptanceCriteria() and buildParityChecks() now generate
behavioral criteria for every task containing functions:

Acceptance criteria:
- Full implementation required (no stubs/TODOs/placeholders)
- Behavioral equivalence using idiomatic target-language patterns
- Implementation depth must match source complexity

Parity checks:
- All source code paths reachable in target
- No hollow implementations (input-dependent output required)
- Internal call chains wired end-to-end

These criteria apply uniformly regardless of task size — a 30-line
hash function gets the same behavioral bar as a 900-line compressor.
Criteria use behavioral language (same observable outputs) rather than
structural language, so idiomatic rewrites are not penalized.
@jafreck jafreck merged commit e7f4e7e into main Mar 27, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant