feat(training): JSONL exporter for ATIF trajectories by anderskev · Pull Request #95 · existential-birds/daydream

anderskev · 2026-05-18T04:41:27Z

Summary

Adds daydream export-jsonl — a new CLI subcommand that turns archived ATIF trajectories into a versioned, schema-validated JSONL corpus suitable for training/eval pipelines.
Introduces the daydream/training/ package: schema v1 definition, exclusion + copyleft filtering, record/span builders, deterministic stack-stratification, and an end-to-end export orchestrator.
Extends the archive manifest with a code_context block (base_sha, changed_files) so newly archived runs carry the git context the exporter needs end-to-end.

Motivation

The trajectory archive captures per-run ATIF data, but there was no supported way to turn it into a training-ready dataset. Downstream consumers need:

A stable, versioned JSONL schema (v1) they can validate against.
Deterministic, byte-identical output across reruns so corpora can be diffed and cached.
Built-in filtering for license-incompatible (copyleft) and explicitly excluded sources.
Stack-aware stratification so a single dominant stack does not swamp the corpus.

This PR delivers that pipeline in one focused milestone so the training side can consume archived runs without bespoke glue.

Changes

Added

daydream/training/ package: schema.py, exclusion.py, export.py, schema artifacts (schema/v1.json, schema/copyleft.txt, schema/exclusion.txt).
daydream export-jsonl --out <path> CLI subcommand with the full filter, stratification, opt-in, and diagnostic flag surface from the plan (validates --max-stack-share in (0, 1] and --min-grounding in [0, 1]).
ExportConfig + run_export orchestrator: schema-only short-circuit, filter+stratify pipeline, dry-run summary, atomic JSONL write via tempfile + Path.replace, and schema.json side-car emission.
GitContext + Manifest carry base_sha and changed_files; git_ops.diff_name_only populates them at archive time; manifest.to_dict() emits a code_context block.
archive.index.count_runs() for cheap unfiltered totals (no row materialization).
Test fixtures under tests/fixtures/training/ plus suites covering exclusion, query/filter, record/span builders, stratification, and the end-to-end export.

Changed

Exporter sources head_sha / branch / base_branch from the manifest git block (was inadvertently reading from code_context); falls back to manifest_row when older archives have no code_context block.
is_copyleft() accepts a pre-loaded list so the per-row hot loop in _query_index doesn't reopen the file N times.
_build_query() accepts an exclusion kwarg (typed frozenset | set | None) so callers can inject the set without re-reading from disk.
Step IDs are cast to int to match the v1 schema integer constraint.
JSONL output uses sort_keys=True and compact JSON separators for deterministic, byte-identical reruns.
code_context field order in the manifest now matches the schema definition (head_sha then base_sha).
Skip-record warning copy clarified from "missing" to "corrupt or unreadable"; stack=None warning is now gated behind stratify_by == \"stack\".

Fixed

Atomic tempfile writes are wrapped in try/except so a failed write doesn't leave an orphan .tmp file beside the output.
Multi-outcome sessions go through _single_outcome_label(), which warns instead of silently dropping labels.
Schema v1 drops the unused test_outcome field.

Test Plan

uv run pytest — full suite passes locally (343 existing + new training/archive/git_ops coverage).
uv run ruff check and uv run mypy daydream pass; the _build_query exclusion-param typing fix was added specifically to unblock mypy.
Exporter determinism verified via a fixture-driven byte-identical rerun test.
Schema-only short-circuit, dry-run no-op, and missing-trajectory skip paths each covered by a dedicated test.
git_ops.diff_name_only covered for happy path, multi-file output, empty-line filtering, and bad-ref soft-failure.

Checklist

Tests pass locally (uv run pytest)
Linting passes (uv run ruff check)
Type checking passes (uv run mypy daydream)
Documentation updated (if applicable) — CLI surface documented inline; no user-facing docs site to update for this milestone.

Additional Context

Diff size: ~2,100 LOC added across 22 files; the bulk lives in daydream/training/export.py and its test suite. The code_context rollout is forward-compatible: older archives that predate the block surface base_sha=None / changed_files=[], which the v1 schema explicitly allows.

Generated with Claude Code

Adds ExportConfig + run_export — the orchestration layer that ties Waves 2–5 together. Implements plan §10 step 6: schema-only short-circuit, filter+stratify pipeline, dry-run summary, atomic JSONL write via tempfile + Path.replace, and schema.json side-car emission. Records are serialized with compact JSON separators so output stays small and deterministic across runs (covers AC #1, #2, #7, #8). Skips rows whose archive directory is missing manifest.json or trajectory.json with a warning, rather than crashing the whole export. Tests use the §9 fixture matrix unchanged — JSONL validity against the schema, required-field presence, byte-identical re-runs, schema.json emission, dry-run no-op, missing-trajectory skip, and emit_schema_only short-circuit.

Adds `daydream export-jsonl --out <path>` with the full filter, stratification, opt-in, and diagnostic flag surface from plan §4. Validates --max-stack-share in (0, 1] and --min-grounding in [0, 1] before constructing ExportConfig. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…fests Extends GitContext + Manifest to carry base_sha and changed_files at archive time, adds git_ops.diff_name_only, emits a code_context block in manifest.to_dict(), and updates the JSONL exporter to source those fields from the on-disk manifest. Older archives that predate the code_context block surface base_sha=None and changed_files=[] — schema allows nullable values there. Also warns once per unknown skill encountered during query_index so stratify=None buckets are visible rather than silent.

…anup - export: fall back to manifest_row for head_sha/branch/base_branch when the manifest dict has no code_context block (fixes a v1-schema gap where scalar fields surfaced as None for older archives). - export: cast step_id to int so non-int trajectory values match the v1 schema's integer type constraint. - archive.index: add count_runs() and use it for the unfiltered summary count instead of materialising every row via query_runs. - training.exclusion: let is_copyleft() accept a pre-loaded copyleft_list so a per-row loop in _query_index doesn't reopen the file N times. - export: warn when records have stack=None (unmapped skill) before stratification so silent grouping doesn't surprise callers. - schema/v1.json + export: drop the unused test_outcome field. - stratify: document that max_stack_share is an input-corpus cap, not an output-share guarantee, with a worked example. - test_git_ops: add coverage for diff_name_only (happy path, multi-file, empty-line filtering, bad-ref soft-failure). - uv.lock: sync to 0.17.0. Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Read head_sha, base_branch, branch from manifest `git` block instead of `code_context` (fixes regression introduced when the two blocks were split) - Reorder `code_context` serialisation so base_sha follows head_sha for readability; field order now matches the schema definition - Extract `_single_outcome_label()` helper that warns (rather than silently drops) when a session carries multiple outcome labels - Accept an `exclusion` kwarg on `_build_query()` so callers can inject the set without re-loading from disk (improves testability) - Guard the stack=None warning behind the `stratify_by == "stack"` branch so it only fires when stratification is actually requested - Add `sort_keys=True` to `json.dumps` for deterministic JSONL output - Wrap atomic tempfile write in try/except to clean up the .tmp file on any write error - Clarify "missing" → "corrupt or unreadable" in the skip-record warning - Expand noqa comments in tests to explain why subprocess args are safe Daydream-Run: 20260517230342-fb31586b Daydream-Version: 0.17.0

load_exclusion_list() returns frozenset[str]; the injected kwarg was typed set[str] | None which mypy rejected on assignment. Daydream-Run: 20260517230342-fb31586b Daydream-Version: 0.17.0

coderabbitai · 2026-05-18T04:45:33Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f8c5f62f-886e-4f86-a5d1-ff678ba276cb

📥 Commits

Reviewing files that changed from the base of the PR and between 2d1bc51 and e12fcce.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

daydream/cli.py
daydream/git_ops.py
daydream/training/export.py
pyproject.toml
tests/fixtures/training/build_archive.py

🚧 Files skipped from review as they are similar to previous changes (4)

daydream/git_ops.py
daydream/cli.py
tests/fixtures/training/build_archive.py
daydream/training/export.py

Walkthrough

This PR adds a training-record JSONL exporter: archive metadata now includes merge-base SHA and changed files; training package, v1 JSON Schema, and file-backed exclusion/copyleft lists are added; trajectories are converted to schema v1 records (spans, labels, code_context); a parameterized SQL query/filter pipeline with copyleft handling and skill->stack mapping is implemented; stack-based stratification is supported; run_export orchestrates counting, querying, stratifying, and atomic JSONL+schema emission; a synchronous CLI subcommand exposes the flow; fixture-backed tests cover the end-to-end behavior.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 61.54% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(training): JSONL exporter for ATIF trajectories' accurately summarizes the main change: introducing a JSONL export pipeline for archived ATIF trajectories.
Description check	✅ Passed	The description comprehensively details the PR's purpose, changes, and context related to the JSONL exporter, archive manifest extensions, filtering, stratification, and test coverage.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@daydream/cli.py`:
- Around line 379-383: Detect when both args.include_all_labels and args.label
are provided and fail fast instead of silently overriding: in the CLI handling
code where labels is computed (the block referencing args.include_all_labels,
args.label, and labels), add a check that if args.include_all_labels is true and
args.label is non-empty, call the argument parser's error/exit (e.g.,
parser.error or raise SystemExit with a clear message) or refactor the flags
into an argparse mutually exclusive group so the parser prevents both from being
set; ensure the error message clearly states that --include-all-labels cannot be
used with --label.

In `@daydream/git_ops.py`:
- Around line 502-505: The call to _run_git inside diff_name_only can raise
GitError and must be caught to preserve the function's contract of returning []
on subprocess failure; wrap the _run_git call in a try/except that catches
GitError (the exception type raised by _run_git) and return [] from the except
block, keeping the existing behavior that also returns [] when proc.returncode
!= 0 and leaving the timeout and arguments to _run_git unchanged.

In `@daydream/training/export.py`:
- Around line 63-69: The step_id parsing in the loop over
trajectory.get("steps", []) (inside daydream/training/export.py) can raise
ValueError/TypeError when calling int(step.get("step_id", i + 1)); guard this by
wrapping the int(...) conversion in a try/except (catching ValueError and
TypeError) and on failure fall back to a safe default (e.g., use i + 1 or None)
and optionally log/debug the malformed step; update the code around the step_id
assignment so malformed step_id values do not abort export.
- Around line 164-169: The parsing of raw_labels (outcome_labels) silently
swallows JSON errors and drops data; modify the try/except around
json.loads(raw_labels) so that on JSONDecodeError/TypeError you log the error
and the offending raw_labels (and any record identifier available) before
falling back to an empty list, or re-raise if that better fits upstream
handling; update the except block to call the module logger (e.g.,
logging.exception or logger.warning with exc_info=True) referencing raw_labels
and keep the subsequent isinstance(labels, list) check to enforce type safety.

In `@tests/fixtures/training/build_archive.py`:
- Around line 27-37: The FixtureSession dataclass docstring is missing a
Google-style "Attributes:" section and the public function build_fixture_archive
lacks a "Returns:" section; update the FixtureSession docstring to include an
Attributes: block that lists session_id (str), repo_slug (str), skill (str),
grounding_rate (float), outcome_labels (tuple[str, ...]), status (str, default
"complete"), and notes (str, default ""), and update the build_fixture_archive
docstring to include a Google-style "Returns:" section describing the return
type and what the returned value represents (and add or complete Args: and
Raises: sections if applicable), ensuring wording matches existing docstring
style and uses the exact symbol names FixtureSession and build_fixture_archive
to locate the spots to edit.

In `@tests/test_archive.py`:
- Around line 85-93: Update the inline `# noqa` comments on the subprocess.run
calls in tests/test_archive.py (the git init/config/commit invocations) to
include an explicit rationale for S607 in addition to S603; e.g., augment each
`# noqa: S603, S607` comment to state that arguments are not user-controlled and
that the `git` command is a hardcoded, trusted command (same change also for the
later subprocess.run calls around lines 113-137).

In `@tests/test_training_export.py`:
- Line 16: The test suite imports the jsonschema package (seen in
test_training_export.py and test_training_record.py) but jsonschema is not
declared as a dependency; add jsonschema to the test/dev dependencies in
pyproject.toml (e.g., under [tool.poetry.dev-dependencies] or
[project.optional-dependencies."test"]) so CI installs it before running tests
and import errors are avoided.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5e0e1b36-f6a8-44c7-bd46-f8ad8718001b

📥 Commits

Reviewing files that changed from the base of the PR and between f25b1ba and 2d1bc51.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (21)

daydream/archive/git_context.py
daydream/archive/index.py
daydream/archive/manifest.py
daydream/cli.py
daydream/git_ops.py
daydream/training/__init__.py
daydream/training/exclusion.py
daydream/training/export.py
daydream/training/schema.py
daydream/training/schema/copyleft.txt
daydream/training/schema/exclusion.txt
daydream/training/schema/v1.json
tests/fixtures/training/__init__.py
tests/fixtures/training/build_archive.py
tests/test_archive.py
tests/test_git_ops.py
tests/test_training_exclusion.py
tests/test_training_export.py
tests/test_training_query.py
tests/test_training_record.py
tests/test_training_stratify.py

… deps - Fail fast on conflicting --include-all-labels + --label flags (cli.py) - Use parse_intermixed_args for feedback subcommand so optional TARGET positional is recognized after flags (cli.py) - Catch GitError in diff_name_only() to honor soft-failure contract (git_ops.py) - Warn instead of silently dropping malformed step_id and outcome_labels in export pipeline (training/export.py) - Add jsonschema to dev dependencies — used directly in tests (pyproject.toml) - Add Attributes section to FixtureSession docstring (build_archive.py) Co-Authored-By: Claude <noreply@anthropic.com>

anderskev and others added 11 commits May 17, 2026 12:38

feat(training): scaffold JSONL exporter package and schema artifacts

7a2aece

feat(training): add exclusion-list and copyleft helpers

4f71ec3

feat(training): add record and span builders for JSONL export

71bc5d8

feat(training): add query and filter pipeline against archive index

d2e3b02

feat(training): add stack-stratification with deterministic ordering

568865e

fix: widen _build_query exclusion param type to frozenset | set | None

2d1bc51

load_exclusion_list() returns frozenset[str]; the injected kwarg was typed set[str] | None which mypy rejected on assignment. Daydream-Run: 20260517230342-fb31586b Daydream-Version: 0.17.0

anderskev added enhancement New feature or request area:training Training pipeline and data preparation labels May 18, 2026

anderskev self-assigned this May 18, 2026

coderabbitai Bot requested changes May 18, 2026

View reviewed changes

coderabbitai Bot approved these changes May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(training): JSONL exporter for ATIF trajectories#95

feat(training): JSONL exporter for ATIF trajectories#95
anderskev wants to merge 12 commits into
mainfrom
feat/jsonl-exporter-r1

anderskev commented May 18, 2026

Uh oh!

coderabbitai Bot commented May 18, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anderskev commented May 18, 2026

Summary

Motivation

Changes

Added

Changed

Fixed

Test Plan

Checklist

Additional Context

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 18, 2026 •

edited

Loading