fix(determinism): stabilize contract_hash, widen NoWallClock, correct DET-003#211
fix(determinism): stabilize contract_hash, widen NoWallClock, correct DET-003#211yairfalse wants to merge 2 commits into
Conversation
| "name": "Occurrence shape is stable after stripping variable fields", | ||
| "fixture": "single_task", | ||
| "command": "sykli >/dev/null && jq 'del(.id,.timestamp,.context.labels[\"sykli.run_id\"],.data.tasks[].duration_ms,.data.duration_ms,.history.duration_ms,.history.steps[].duration_ms,.history.steps[].timestamp)' .sykli/occurrence.json > a.json && sykli >/dev/null && jq 'del(.id,.timestamp,.context.labels[\"sykli.run_id\"],.data.tasks[].duration_ms,.data.duration_ms,.history.duration_ms,.history.steps[].duration_ms,.history.steps[].timestamp)' .sykli/occurrence.json > b.json && cmp -s a.json b.json", | ||
| "command": "mkdir -p r1 r2 && cp sykli.exs r1/ && cp sykli.exs r2/ && (cd r1 && sykli >/dev/null) && (cd r2 && sykli >/dev/null) && jq 'del(.id,.timestamp,.context.labels[\"sykli.run_id\"],.data.tasks[].duration_ms,.data.duration_ms,.history.duration_ms,.history.steps[].duration_ms,.history.steps[].timestamp,.data.tasks[].log)' r1/.sykli/occurrence.json > a.json && jq 'del(.id,.timestamp,.context.labels[\"sykli.run_id\"],.data.tasks[].duration_ms,.data.duration_ms,.history.duration_ms,.history.steps[].duration_ms,.history.steps[].timestamp,.data.tasks[].log)' r2/.sykli/occurrence.json > b.json && cmp -s a.json b.json", |
There was a problem hiding this comment.
DET-003 now uses fresh workspaces, which is the right direction, but this normalization deletes .data.tasks[].log entirely. That makes the test blind to deterministic regressions in the log evidence path itself, even though the field is part of the occurrence shape we want downstream tools to trust. Can we normalize just the variable run-id segment instead, e.g. with walk(if type == "string" then gsub("[0-9A-HJKMNP-TV-Z]{26}"; "<run_id>") else . end) or a targeted replacement on .data.tasks[].log, and keep the log key/value under comparison?
|
CI note: |
…ield Review feedback on #211: keep the log evidence path under comparison and normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/), rather than deleting .data.tasks[].log entirely. Both runs use fresh workspaces (distinct cache keys), so neither is a cache hit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
892c369 to
890e2ea
Compare
…ield Review feedback on #211: keep the log evidence path under comparison and normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/), rather than deleting .data.tasks[].log entirely. Both runs use fresh workspaces (distinct cache keys), so neither is a cache hit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
890e2ea to
5f56690
Compare
|
Thanks for the review — both points addressed, plus a flake CI surfaced. 1. DET-003 log path (inline comment). Switched from deleting 2. CI not running on the stacked PR. Root cause: Bonus — CI immediately earned its keep. It surfaced a pre-existing, seed-dependent flake in the suite's lone property test ( |
…ield Review feedback on #211: keep the log evidence path under comparison and normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/), rather than deleting .data.tasks[].log entirely. Both runs use fresh workspaces (distinct cache keys), so neither is a cache hit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5f56690 to
c2b43f8
Compare
… DET-003 Monster Phase B (docs/audit-2026-05-22.md): - contract_hash: recursively sort object keys before hashing so semantically-identical contracts hash identically regardless of map iteration order (>32-key maps are not stable across OTP versions). - NoWallClock: widen the Credo guard to pure contract & output-shaping transforms (graph, validate, vocab modules, contract hashing, occurrence serializer/enrichment). Fixes DET-006. Time-stamping engine modules (occurrence factory, run history, cache, OIDC, coordinator) stay exempt. - DET-003: the engine was already deterministic — corrected the flawed black-box case to compare two fresh workspaces and strip the run_id-bearing log path (the prior case reused .sykli/ so run 2 was a cache hit, and under-stripped). DET-003 and DET-006 now pass without expected_failure flags (#207, #208). B1 (injectable clock/id) is deferred to Phase H/DST: with the engine already deterministic it fixes no shipping bug and belongs where it is exercised. Verification: mix format/credo (32 files)/test (1756, 0 failures), escript.build, black-box 166 passed / 0 failed / GH-4 expected-red. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ield Review feedback on #211: keep the log evidence path under comparison and normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/), rather than deleting .data.tasks[].log entirely. Both runs use fresh workspaces (distinct cache keys), so neither is a cache hit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c2b43f8 to
7a68d79
Compare
…ield Review feedback on #211: keep the log evidence path under comparison and normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/), rather than deleting .data.tasks[].log entirely. Both runs use fresh workspaces (distinct cache keys), so neither is a cache hit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
) * fix(determinism): stabilize contract_hash, widen NoWallClock, correct DET-003 Monster Phase B (docs/audit-2026-05-22.md): - contract_hash: recursively sort object keys before hashing so semantically-identical contracts hash identically regardless of map iteration order (>32-key maps are not stable across OTP versions). - NoWallClock: widen the Credo guard to pure contract & output-shaping transforms (graph, validate, vocab modules, contract hashing, occurrence serializer/enrichment). Fixes DET-006. Time-stamping engine modules (occurrence factory, run history, cache, OIDC, coordinator) stay exempt. - DET-003: the engine was already deterministic — corrected the flawed black-box case to compare two fresh workspaces and strip the run_id-bearing log path (the prior case reused .sykli/ so run 2 was a cache hit, and under-stripped). DET-003 and DET-006 now pass without expected_failure flags (#207, #208). B1 (injectable clock/id) is deferred to Phase H/DST: with the engine already deterministic it fixes no shipping bug and belongs where it is exercised. Verification: mix format/credo (32 files)/test (1756, 0 failures), escript.build, black-box 166 passed / 0 failed / GH-4 expected-red. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(det-003): normalize log run-id segment instead of deleting the field Review feedback on #211: keep the log evidence path under comparison and normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/), rather than deleting .data.tasks[].log entirely. Both runs use fresh workspaces (distinct cache keys), so neither is a cache hit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Monster Phase B of the audit-remediation program (
docs/audit-2026-05-22.md). Stacked on #210 (Phase A).The investigation produced an honest course-correction: two of the audit's three "determinism bugs" were not engine bugs. Findings and fixes:
contract_hashstability (real bug, fixed).from_json/1now recursively sorts object keys before hashing, so semantically-identical contracts hash identically regardless of map iteration order. Maps with >32 keys use a hashed representation whose iteration order is not stable across OTP versions — the latent hazard. Tests cover key-order independence including the >32-key case.NoWallClockguard widened (DET-006, fixed). The audit's "broaden to the whole engine" framing was infeasible: there are 83 legitimate wall-clock uses across ~30 files (occurrence timestamps, run history, cache mtimes, OIDC expiry). Broadening wholesale would flag all of them. Instead the guard now covers the pure contract & output-shaping transforms (graph parsing/validation, the vocabulary modules, contract hashing, occurrence serializer/enrichment) — where wall-clock genuinely is a determinism bug. Credo now lints 32 files (was 10), still green. Time-stamping engine modules stay intentionally exempt.logpath. The prior case reused.sykli/(so run 2 was a cache hit, since the cache lives globally at~/.sykli/cache) and stripped run_id from labels but not from the log path. Corrected to compare two fresh workspaces and strip the run_id-bearing log path. The engine is deterministic.DET-003 and DET-006 now pass without
expected_failureflags (closes #207, #208).Descoped (honest)
B1 (injectable clock/id seam) is deferred to Phase H/DST. Its original justification was fixing DET-003; since the engine is already deterministic it fixes no shipping bug, and the seam belongs where deterministic simulation actually exercises it. Per the program's own rule: don't invent a fix for a non-problem.
Verification
mix format --check-formattedmix credomix testmix escript.buildtest/blackbox/run.sh🤖 Generated with Claude Code