Skip to content

fix(determinism): stabilize contract_hash, widen NoWallClock, correct DET-003#211

Closed
yairfalse wants to merge 2 commits into
feature/monster-a-ci-foundationfrom
feature/monster-b-determinism
Closed

fix(determinism): stabilize contract_hash, widen NoWallClock, correct DET-003#211
yairfalse wants to merge 2 commits into
feature/monster-a-ci-foundationfrom
feature/monster-b-determinism

Conversation

@yairfalse
Copy link
Copy Markdown
Collaborator

Summary

Monster Phase B of the audit-remediation program (docs/audit-2026-05-22.md). Stacked on #210 (Phase A).

The investigation produced an honest course-correction: two of the audit's three "determinism bugs" were not engine bugs. Findings and fixes:

  • B4 — contract_hash stability (real bug, fixed). from_json/1 now recursively sorts object keys before hashing, so semantically-identical contracts hash identically regardless of map iteration order. Maps with >32 keys use a hashed representation whose iteration order is not stable across OTP versions — the latent hazard. Tests cover key-order independence including the >32-key case.
  • B3 — NoWallClock guard widened (DET-006, fixed). The audit's "broaden to the whole engine" framing was infeasible: there are 83 legitimate wall-clock uses across ~30 files (occurrence timestamps, run history, cache mtimes, OIDC expiry). Broadening wholesale would flag all of them. Instead the guard now covers the pure contract & output-shaping transforms (graph parsing/validation, the vocabulary modules, contract hashing, occurrence serializer/enrichment) — where wall-clock genuinely is a determinism bug. Credo now lints 32 files (was 10), still green. Time-stamping engine modules stay intentionally exempt.
  • B2 — DET-003 was a flawed test, not an engine bug. Reproduced and proved: in two fresh workspaces the occurrence output is byte-identical except the run_id embedded in the log path. The prior case reused .sykli/ (so run 2 was a cache hit, since the cache lives globally at ~/.sykli/cache) and stripped run_id from labels but not from the log path. Corrected to compare two fresh workspaces and strip the run_id-bearing log path. The engine is deterministic.

DET-003 and DET-006 now pass without expected_failure flags (closes #207, #208).

Descoped (honest)

B1 (injectable clock/id seam) is deferred to Phase H/DST. Its original justification was fixing DET-003; since the engine is already deterministic it fixes no shipping bug, and the seam belongs where deterministic simulation actually exercises it. Per the program's own rule: don't invent a fix for a non-problem.

Verification

Gate Result
mix format --check-formatted clean
mix credo no issues (now 32 files)
mix test 1756 tests, 0 failures (+2 contract_hash)
mix escript.build builds
test/blackbox/run.sh 166 passed, 0 failed, GH-4 expected-red (Phase C)
DET-003 / DET-006 pass without flags

🤖 Generated with Claude Code

Comment thread test/blackbox/dataset.json Outdated
"name": "Occurrence shape is stable after stripping variable fields",
"fixture": "single_task",
"command": "sykli >/dev/null && jq 'del(.id,.timestamp,.context.labels[\"sykli.run_id\"],.data.tasks[].duration_ms,.data.duration_ms,.history.duration_ms,.history.steps[].duration_ms,.history.steps[].timestamp)' .sykli/occurrence.json > a.json && sykli >/dev/null && jq 'del(.id,.timestamp,.context.labels[\"sykli.run_id\"],.data.tasks[].duration_ms,.data.duration_ms,.history.duration_ms,.history.steps[].duration_ms,.history.steps[].timestamp)' .sykli/occurrence.json > b.json && cmp -s a.json b.json",
"command": "mkdir -p r1 r2 && cp sykli.exs r1/ && cp sykli.exs r2/ && (cd r1 && sykli >/dev/null) && (cd r2 && sykli >/dev/null) && jq 'del(.id,.timestamp,.context.labels[\"sykli.run_id\"],.data.tasks[].duration_ms,.data.duration_ms,.history.duration_ms,.history.steps[].duration_ms,.history.steps[].timestamp,.data.tasks[].log)' r1/.sykli/occurrence.json > a.json && jq 'del(.id,.timestamp,.context.labels[\"sykli.run_id\"],.data.tasks[].duration_ms,.data.duration_ms,.history.duration_ms,.history.steps[].duration_ms,.history.steps[].timestamp,.data.tasks[].log)' r2/.sykli/occurrence.json > b.json && cmp -s a.json b.json",
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DET-003 now uses fresh workspaces, which is the right direction, but this normalization deletes .data.tasks[].log entirely. That makes the test blind to deterministic regressions in the log evidence path itself, even though the field is part of the occurrence shape we want downstream tools to trust. Can we normalize just the variable run-id segment instead, e.g. with walk(if type == "string" then gsub("[0-9A-HJKMNP-TV-Z]{26}"; "<run_id>") else . end) or a targeted replacement on .data.tasks[].log, and keep the log key/value under comparison?

@yairfalse
Copy link
Copy Markdown
Collaborator Author

CI note: gh pr checks 211 currently reports no checks for feature/monster-b-determinism. Because this PR is stacked on feature/monster-a-ci-foundation and the workflow is configured with pull_request.branches: [main], the Phase A CI pyramid is not running on this stacked PR. Before merge, either retarget temporarily to main, manually dispatch the relevant checks, or broaden the workflow branch filter so stacked phase PRs still exercise black-box/conformance/Credo. Otherwise the PR relies only on local verification.

yairfalse added a commit that referenced this pull request May 23, 2026
…ield

Review feedback on #211: keep the log evidence path under comparison and
normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/),
rather than deleting .data.tasks[].log entirely. Both runs use fresh
workspaces (distinct cache keys), so neither is a cache hit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yairfalse yairfalse force-pushed the feature/monster-b-determinism branch from 892c369 to 890e2ea Compare May 23, 2026 21:49
yairfalse added a commit that referenced this pull request May 23, 2026
…ield

Review feedback on #211: keep the log evidence path under comparison and
normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/),
rather than deleting .data.tasks[].log entirely. Both runs use fresh
workspaces (distinct cache keys), so neither is a cache hit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yairfalse yairfalse force-pushed the feature/monster-b-determinism branch from 890e2ea to 5f56690 Compare May 23, 2026 22:02
@yairfalse
Copy link
Copy Markdown
Collaborator Author

Thanks for the review — both points addressed, plus a flake CI surfaced.

1. DET-003 log path (inline comment). Switched from deleting .data.tasks[].log to normalizing only the variable run-id segment (logs/<run_id>/logs/RUN_ID/), so the rest of the log evidence path stays under comparison. The two runs use fresh workspaces (distinct cache keys → neither is a cache hit). Verified: test/blackbox/run.sh --filter=DET → 6 passed, 0 failed.

2. CI not running on the stacked PR. Root cause: on.pull_request.branches was [main]. Broadened to [main, 'feature/**'] in #210 (the CI-foundation PR, so the whole train benefits). CI now runs on both #210 and #211.

Bonus — CI immediately earned its keep. It surfaced a pre-existing, seed-dependent flake in the suite's lone property test (event_queue_test.exs:56): StreamData.uniq_list_of(StreamData.integer()) exhausts the duplicate-retry limit at low generation sizes (TooManyDuplicatesError). Fixed in #210 by drawing unique seqs from a wide range with bounded length (uniqueness preserved — the property relies on {at_ms, seq} being a total order). Verified green across 5 seeds.

#211 is rebased on the updated #210.

yairfalse added a commit that referenced this pull request May 23, 2026
…ield

Review feedback on #211: keep the log evidence path under comparison and
normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/),
rather than deleting .data.tasks[].log entirely. Both runs use fresh
workspaces (distinct cache keys), so neither is a cache hit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yairfalse yairfalse force-pushed the feature/monster-b-determinism branch from 5f56690 to c2b43f8 Compare May 23, 2026 22:31
yairfalse and others added 2 commits May 24, 2026 01:37
… DET-003

Monster Phase B (docs/audit-2026-05-22.md):

- contract_hash: recursively sort object keys before hashing so
  semantically-identical contracts hash identically regardless of map
  iteration order (>32-key maps are not stable across OTP versions).
- NoWallClock: widen the Credo guard to pure contract & output-shaping
  transforms (graph, validate, vocab modules, contract hashing, occurrence
  serializer/enrichment). Fixes DET-006. Time-stamping engine modules
  (occurrence factory, run history, cache, OIDC, coordinator) stay exempt.
- DET-003: the engine was already deterministic — corrected the flawed
  black-box case to compare two fresh workspaces and strip the
  run_id-bearing log path (the prior case reused .sykli/ so run 2 was a
  cache hit, and under-stripped).

DET-003 and DET-006 now pass without expected_failure flags (#207, #208).
B1 (injectable clock/id) is deferred to Phase H/DST: with the engine already
deterministic it fixes no shipping bug and belongs where it is exercised.

Verification: mix format/credo (32 files)/test (1756, 0 failures),
escript.build, black-box 166 passed / 0 failed / GH-4 expected-red.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ield

Review feedback on #211: keep the log evidence path under comparison and
normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/),
rather than deleting .data.tasks[].log entirely. Both runs use fresh
workspaces (distinct cache keys), so neither is a cache hit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yairfalse yairfalse force-pushed the feature/monster-b-determinism branch from c2b43f8 to 7a68d79 Compare May 23, 2026 22:37
@yairfalse yairfalse deleted the branch feature/monster-a-ci-foundation May 24, 2026 21:40
@yairfalse yairfalse closed this May 24, 2026
yairfalse added a commit that referenced this pull request May 24, 2026
…ield

Review feedback on #211: keep the log evidence path under comparison and
normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/),
rather than deleting .data.tasks[].log entirely. Both runs use fresh
workspaces (distinct cache keys), so neither is a cache hit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yairfalse added a commit that referenced this pull request May 24, 2026
)

* fix(determinism): stabilize contract_hash, widen NoWallClock, correct DET-003

Monster Phase B (docs/audit-2026-05-22.md):

- contract_hash: recursively sort object keys before hashing so
  semantically-identical contracts hash identically regardless of map
  iteration order (>32-key maps are not stable across OTP versions).
- NoWallClock: widen the Credo guard to pure contract & output-shaping
  transforms (graph, validate, vocab modules, contract hashing, occurrence
  serializer/enrichment). Fixes DET-006. Time-stamping engine modules
  (occurrence factory, run history, cache, OIDC, coordinator) stay exempt.
- DET-003: the engine was already deterministic — corrected the flawed
  black-box case to compare two fresh workspaces and strip the
  run_id-bearing log path (the prior case reused .sykli/ so run 2 was a
  cache hit, and under-stripped).

DET-003 and DET-006 now pass without expected_failure flags (#207, #208).
B1 (injectable clock/id) is deferred to Phase H/DST: with the engine already
deterministic it fixes no shipping bug and belongs where it is exercised.

Verification: mix format/credo (32 files)/test (1756, 0 failures),
escript.build, black-box 166 passed / 0 failed / GH-4 expected-red.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(det-003): normalize log run-id segment instead of deleting the field

Review feedback on #211: keep the log evidence path under comparison and
normalize only the variable run-id segment (logs/<run_id>/ -> logs/RUN_ID/),
rather than deleting .data.tasks[].log entirely. Both runs use fresh
workspaces (distinct cache keys), so neither is a cache hit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant