fix: add behavioral depth criteria to task acceptance and parity checks by jafreck · Pull Request #225 · jafreck/AAMF

jafreck · 2026-03-27T17:01:28Z

Problem

The task graph builder generated only structural acceptance criteria for migration tasks:

"All symbols correctly migrated"
"Call-site signatures match"
"Target code compiles without type errors"

Meanwhile, the parity verifier enforced behavioral equivalence — checking for hollow implementations, dead dispatch, pass-through codecs, and semantic effectiveness. This disconnect caused the code-migrator to produce implementations that satisfied the stated acceptance criteria (compiled, had the right symbols) but were algorithmically hollow — leading to expensive parity retry loops (7+ attempts per task in the zstd migration).

Root Cause

buildAcceptanceCriteria() and buildParityChecks() in task-graph-builder.ts only generated structural checks. The code-migrator had no upfront signal that it would be held to a behavioral standard, so it optimized for the criteria it was given.

Fix

Both functions now generate behavioral criteria for every task containing functions:

Acceptance Criteria (new)

Every function body fully implemented — no stubs, TODOs, or placeholders
Behavioral equivalence: same observable outputs using idiomatic target-language patterns (different types, signatures, and error models are expected)
Implementation depth matches source complexity — no pass-throughs or synthetic wrappers

Parity Checks (new)

All source code paths and branches reachable in the target
No hollow implementations: functions must produce non-trivial, input-dependent output
Internal call chains wired end-to-end — public entry points invoke the same algorithmic stages

These criteria:

Apply uniformly regardless of task size (a 30-line hash gets the same bar as a 900-line compressor)
Use behavioral language ("same observable outputs") not structural language, so idiomatic rewrites are not penalized
Bridge the gap so the migrator knows upfront what the verifier will enforce

Test Coverage

Added 4 new tests:

should include behavioral acceptance criteria for tasks with functions
should include behavioral parity checks for tasks with functions
should not include behavioral criteria for type-only tasks
should apply behavioral criteria uniformly regardless of task size

All 48 task-graph-builder tests pass. Full suite: 1498 passed.

The task graph builder was generating only structural acceptance criteria (symbols exist, signatures match, compiles) while the parity verifier enforced full behavioral equivalence. This gap caused the code-migrator to produce hollow implementations that compiled but didn't actually perform the intended computation — leading to expensive retry loops. Both buildAcceptanceCriteria() and buildParityChecks() now generate behavioral criteria for every task containing functions: Acceptance criteria: - Full implementation required (no stubs/TODOs/placeholders) - Behavioral equivalence using idiomatic target-language patterns - Implementation depth must match source complexity Parity checks: - All source code paths reachable in target - No hollow implementations (input-dependent output required) - Internal call chains wired end-to-end These criteria apply uniformly regardless of task size — a 30-line hash function gets the same behavioral bar as a 900-line compressor. Criteria use behavioral language (same observable outputs) rather than structural language, so idiomatic rewrites are not penalized.

…mance

jafreck added 2 commits March 27, 2026 10:00

fix: clarify guidance to ban unsafe and prioritize safety over perfor…

26c7bb8

…mance

jafreck merged commit e7f4e7e into main Mar 27, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add behavioral depth criteria to task acceptance and parity checks#225

fix: add behavioral depth criteria to task acceptance and parity checks#225
jafreck merged 2 commits intomainfrom
fix/behavioral-acceptance-criteria

jafreck commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jafreck commented Mar 27, 2026

Problem

Root Cause

Fix

Acceptance Criteria (new)

Parity Checks (new)

Test Coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant