fix(determinism): stabilize contract_hash, widen NoWallClock, correct DET-003 by yairfalse · Pull Request #211 · false-systems/sykli

yairfalse · 2026-05-22T22:17:19Z

Summary

Monster Phase B of the audit-remediation program (docs/audit-2026-05-22.md). Stacked on #210 (Phase A).

The investigation produced an honest course-correction: two of the audit's three "determinism bugs" were not engine bugs. Findings and fixes:

B4 — contract_hash stability (real bug, fixed). from_json/1 now recursively sorts object keys before hashing, so semantically-identical contracts hash identically regardless of map iteration order. Maps with >32 keys use a hashed representation whose iteration order is not stable across OTP versions — the latent hazard. Tests cover key-order independence including the >32-key case.
B3 — NoWallClock guard widened (DET-006, fixed). The audit's "broaden to the whole engine" framing was infeasible: there are 83 legitimate wall-clock uses across ~30 files (occurrence timestamps, run history, cache mtimes, OIDC expiry). Broadening wholesale would flag all of them. Instead the guard now covers the pure contract & output-shaping transforms (graph parsing/validation, the vocabulary modules, contract hashing, occurrence serializer/enrichment) — where wall-clock genuinely is a determinism bug. Credo now lints 32 files (was 10), still green. Time-stamping engine modules stay intentionally exempt.
B2 — DET-003 was a flawed test, not an engine bug. Reproduced and proved: in two fresh workspaces the occurrence output is byte-identical except the run_id embedded in the log path. The prior case reused .sykli/ (so run 2 was a cache hit, since the cache lives globally at ~/.sykli/cache) and stripped run_id from labels but not from the log path. Corrected to compare two fresh workspaces and strip the run_id-bearing log path. The engine is deterministic.

DET-003 and DET-006 now pass without expected_failure flags (closes #207, #208).

Descoped (honest)

B1 (injectable clock/id seam) is deferred to Phase H/DST. Its original justification was fixing DET-003; since the engine is already deterministic it fixes no shipping bug, and the seam belongs where deterministic simulation actually exercises it. Per the program's own rule: don't invent a fix for a non-problem.

Verification

Gate	Result
`mix format --check-formatted`	clean
`mix credo`	no issues (now 32 files)
`mix test`	1756 tests, 0 failures (+2 contract_hash)
`mix escript.build`	builds
`test/blackbox/run.sh`	166 passed, 0 failed, GH-4 expected-red (Phase C)
DET-003 / DET-006	pass without flags

🤖 Generated with Claude Code

yairfalse · 2026-05-22T22:20:23Z

      "name": "Occurrence shape is stable after stripping variable fields",
      "fixture": "single_task",
-      "command": "sykli >/dev/null && jq 'del(.id,.timestamp,.context.labels[\"sykli.run_id\"],.data.tasks[].duration_ms,.data.duration_ms,.history.duration_ms,.history.steps[].duration_ms,.history.steps[].timestamp)' .sykli/occurrence.json > a.json && sykli >/dev/null && jq 'del(.id,.timestamp,.context.labels[\"sykli.run_id\"],.data.tasks[].duration_ms,.data.duration_ms,.history.duration_ms,.history.steps[].duration_ms,.history.steps[].timestamp)' .sykli/occurrence.json > b.json && cmp -s a.json b.json",
+      "command": "mkdir -p r1 r2 && cp sykli.exs r1/ && cp sykli.exs r2/ && (cd r1 && sykli >/dev/null) && (cd r2 && sykli >/dev/null) && jq 'del(.id,.timestamp,.context.labels[\"sykli.run_id\"],.data.tasks[].duration_ms,.data.duration_ms,.history.duration_ms,.history.steps[].duration_ms,.history.steps[].timestamp,.data.tasks[].log)' r1/.sykli/occurrence.json > a.json && jq 'del(.id,.timestamp,.context.labels[\"sykli.run_id\"],.data.tasks[].duration_ms,.data.duration_ms,.history.duration_ms,.history.steps[].duration_ms,.history.steps[].timestamp,.data.tasks[].log)' r2/.sykli/occurrence.json > b.json && cmp -s a.json b.json",


DET-003 now uses fresh workspaces, which is the right direction, but this normalization deletes .data.tasks[].log entirely. That makes the test blind to deterministic regressions in the log evidence path itself, even though the field is part of the occurrence shape we want downstream tools to trust. Can we normalize just the variable run-id segment instead, e.g. with walk(if type == "string" then gsub("[0-9A-HJKMNP-TV-Z]{26}"; "<run_id>") else . end) or a targeted replacement on .data.tasks[].log, and keep the log key/value under comparison?

yairfalse · 2026-05-22T22:20:33Z

CI note: gh pr checks 211 currently reports no checks for feature/monster-b-determinism. Because this PR is stacked on feature/monster-a-ci-foundation and the workflow is configured with pull_request.branches: [main], the Phase A CI pyramid is not running on this stacked PR. Before merge, either retarget temporarily to main, manually dispatch the relevant checks, or broaden the workflow branch filter so stacked phase PRs still exercise black-box/conformance/Credo. Otherwise the PR relies only on local verification.

…ield Review feedback on #211: keep the log evidence path under comparison and normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/), rather than deleting .data.tasks[].log entirely. Both runs use fresh workspaces (distinct cache keys), so neither is a cache hit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yairfalse · 2026-05-23T22:03:15Z

Thanks for the review — both points addressed, plus a flake CI surfaced.

1. DET-003 log path (inline comment). Switched from deleting .data.tasks[].log to normalizing only the variable run-id segment (logs/<run_id>/ → logs/RUN_ID/), so the rest of the log evidence path stays under comparison. The two runs use fresh workspaces (distinct cache keys → neither is a cache hit). Verified: test/blackbox/run.sh --filter=DET → 6 passed, 0 failed.

2. CI not running on the stacked PR. Root cause: on.pull_request.branches was [main]. Broadened to [main, 'feature/**'] in #210 (the CI-foundation PR, so the whole train benefits). CI now runs on both #210 and #211.

Bonus — CI immediately earned its keep. It surfaced a pre-existing, seed-dependent flake in the suite's lone property test (event_queue_test.exs:56): StreamData.uniq_list_of(StreamData.integer()) exhausts the duplicate-retry limit at low generation sizes (TooManyDuplicatesError). Fixed in #210 by drawing unique seqs from a wide range with bounded length (uniqueness preserved — the property relies on {at_ms, seq} being a total order). Verified green across 5 seeds.

#211 is rebased on the updated #210.

…ield Review feedback on #211: keep the log evidence path under comparison and normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/), rather than deleting .data.tasks[].log entirely. Both runs use fresh workspaces (distinct cache keys), so neither is a cache hit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… DET-003 Monster Phase B (docs/audit-2026-05-22.md): - contract_hash: recursively sort object keys before hashing so semantically-identical contracts hash identically regardless of map iteration order (>32-key maps are not stable across OTP versions). - NoWallClock: widen the Credo guard to pure contract & output-shaping transforms (graph, validate, vocab modules, contract hashing, occurrence serializer/enrichment). Fixes DET-006. Time-stamping engine modules (occurrence factory, run history, cache, OIDC, coordinator) stay exempt. - DET-003: the engine was already deterministic — corrected the flawed black-box case to compare two fresh workspaces and strip the run_id-bearing log path (the prior case reused .sykli/ so run 2 was a cache hit, and under-stripped). DET-003 and DET-006 now pass without expected_failure flags (#207, #208). B1 (injectable clock/id) is deferred to Phase H/DST: with the engine already deterministic it fixes no shipping bug and belongs where it is exercised. Verification: mix format/credo (32 files)/test (1756, 0 failures), escript.build, black-box 166 passed / 0 failed / GH-4 expected-red. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ield Review feedback on #211: keep the log evidence path under comparison and normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/), rather than deleting .data.tasks[].log entirely. Both runs use fresh workspaces (distinct cache keys), so neither is a cache hit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

) * fix(determinism): stabilize contract_hash, widen NoWallClock, correct DET-003 Monster Phase B (docs/audit-2026-05-22.md): - contract_hash: recursively sort object keys before hashing so semantically-identical contracts hash identically regardless of map iteration order (>32-key maps are not stable across OTP versions). - NoWallClock: widen the Credo guard to pure contract & output-shaping transforms (graph, validate, vocab modules, contract hashing, occurrence serializer/enrichment). Fixes DET-006. Time-stamping engine modules (occurrence factory, run history, cache, OIDC, coordinator) stay exempt. - DET-003: the engine was already deterministic — corrected the flawed black-box case to compare two fresh workspaces and strip the run_id-bearing log path (the prior case reused .sykli/ so run 2 was a cache hit, and under-stripped). DET-003 and DET-006 now pass without expected_failure flags (#207, #208). B1 (injectable clock/id) is deferred to Phase H/DST: with the engine already deterministic it fixes no shipping bug and belongs where it is exercised. Verification: mix format/credo (32 files)/test (1756, 0 failures), escript.build, black-box 166 passed / 0 failed / GH-4 expected-red. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(det-003): normalize log run-id segment instead of deleting the field Review feedback on #211: keep the log evidence path under comparison and normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/), rather than deleting .data.tasks[].log entirely. Both runs use fresh workspaces (distinct cache keys), so neither is a cache hit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yairfalse commented May 22, 2026

View reviewed changes

yairfalse force-pushed the feature/monster-b-determinism branch from 892c369 to 890e2ea Compare May 23, 2026 21:49

yairfalse force-pushed the feature/monster-b-determinism branch from 890e2ea to 5f56690 Compare May 23, 2026 22:02

yairfalse force-pushed the feature/monster-b-determinism branch from 5f56690 to c2b43f8 Compare May 23, 2026 22:31

yairfalse and others added 2 commits May 24, 2026 01:37

yairfalse force-pushed the feature/monster-b-determinism branch from c2b43f8 to 7a68d79 Compare May 23, 2026 22:37

yairfalse deleted the branch feature/monster-a-ci-foundation May 24, 2026 21:40

yairfalse closed this May 24, 2026

yairfalse mentioned this pull request May 24, 2026

fix(determinism): stabilize contract_hash, widen NoWallClock, correct DET-003 #212

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(determinism): stabilize contract_hash, widen NoWallClock, correct DET-003#211

fix(determinism): stabilize contract_hash, widen NoWallClock, correct DET-003#211
yairfalse wants to merge 2 commits into
feature/monster-a-ci-foundationfrom
feature/monster-b-determinism

yairfalse commented May 22, 2026

Uh oh!

yairfalse May 22, 2026

Uh oh!

yairfalse commented May 22, 2026

Uh oh!

yairfalse commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yairfalse commented May 22, 2026

Summary

Descoped (honest)

Verification

Uh oh!

yairfalse May 22, 2026

Choose a reason for hiding this comment

Uh oh!

yairfalse commented May 22, 2026

Uh oh!

yairfalse commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant