feat(training): JSONL exporter for ATIF trajectories#95
Conversation
Adds ExportConfig + run_export — the orchestration layer that ties Waves 2–5 together. Implements plan §10 step 6: schema-only short-circuit, filter+stratify pipeline, dry-run summary, atomic JSONL write via tempfile + Path.replace, and schema.json side-car emission. Records are serialized with compact JSON separators so output stays small and deterministic across runs (covers AC #1, #2, #7, #8). Skips rows whose archive directory is missing manifest.json or trajectory.json with a warning, rather than crashing the whole export. Tests use the §9 fixture matrix unchanged — JSONL validity against the schema, required-field presence, byte-identical re-runs, schema.json emission, dry-run no-op, missing-trajectory skip, and emit_schema_only short-circuit.
Adds `daydream export-jsonl --out <path>` with the full filter, stratification, opt-in, and diagnostic flag surface from plan §4. Validates --max-stack-share in (0, 1] and --min-grounding in [0, 1] before constructing ExportConfig. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fests Extends GitContext + Manifest to carry base_sha and changed_files at archive time, adds git_ops.diff_name_only, emits a code_context block in manifest.to_dict(), and updates the JSONL exporter to source those fields from the on-disk manifest. Older archives that predate the code_context block surface base_sha=None and changed_files=[] — schema allows nullable values there. Also warns once per unknown skill encountered during query_index so stratify=None buckets are visible rather than silent.
…anup - export: fall back to manifest_row for head_sha/branch/base_branch when the manifest dict has no code_context block (fixes a v1-schema gap where scalar fields surfaced as None for older archives). - export: cast step_id to int so non-int trajectory values match the v1 schema's integer type constraint. - archive.index: add count_runs() and use it for the unfiltered summary count instead of materialising every row via query_runs. - training.exclusion: let is_copyleft() accept a pre-loaded copyleft_list so a per-row loop in _query_index doesn't reopen the file N times. - export: warn when records have stack=None (unmapped skill) before stratification so silent grouping doesn't surprise callers. - schema/v1.json + export: drop the unused test_outcome field. - stratify: document that max_stack_share is an input-corpus cap, not an output-share guarantee, with a worked example. - test_git_ops: add coverage for diff_name_only (happy path, multi-file, empty-line filtering, bad-ref soft-failure). - uv.lock: sync to 0.17.0. Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Read head_sha, base_branch, branch from manifest `git` block instead of `code_context` (fixes regression introduced when the two blocks were split) - Reorder `code_context` serialisation so base_sha follows head_sha for readability; field order now matches the schema definition - Extract `_single_outcome_label()` helper that warns (rather than silently drops) when a session carries multiple outcome labels - Accept an `exclusion` kwarg on `_build_query()` so callers can inject the set without re-loading from disk (improves testability) - Guard the stack=None warning behind the `stratify_by == "stack"` branch so it only fires when stratification is actually requested - Add `sort_keys=True` to `json.dumps` for deterministic JSONL output - Wrap atomic tempfile write in try/except to clean up the .tmp file on any write error - Clarify "missing" → "corrupt or unreadable" in the skip-record warning - Expand noqa comments in tests to explain why subprocess args are safe Daydream-Run: 20260517230342-fb31586b Daydream-Version: 0.17.0
load_exclusion_list() returns frozenset[str]; the injected kwarg was typed set[str] | None which mypy rejected on assignment. Daydream-Run: 20260517230342-fb31586b Daydream-Version: 0.17.0
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (4)
WalkthroughThis PR adds a training-record JSONL exporter: archive metadata now includes merge-base SHA and changed files; training package, v1 JSON Schema, and file-backed exclusion/copyleft lists are added; trajectories are converted to schema v1 records (spans, labels, code_context); a parameterized SQL query/filter pipeline with copyleft handling and skill->stack mapping is implemented; stack-based stratification is supported; run_export orchestrates counting, querying, stratifying, and atomic JSONL+schema emission; a synchronous CLI subcommand exposes the flow; fixture-backed tests cover the end-to-end behavior. 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@daydream/cli.py`:
- Around line 379-383: Detect when both args.include_all_labels and args.label
are provided and fail fast instead of silently overriding: in the CLI handling
code where labels is computed (the block referencing args.include_all_labels,
args.label, and labels), add a check that if args.include_all_labels is true and
args.label is non-empty, call the argument parser's error/exit (e.g.,
parser.error or raise SystemExit with a clear message) or refactor the flags
into an argparse mutually exclusive group so the parser prevents both from being
set; ensure the error message clearly states that --include-all-labels cannot be
used with --label.
In `@daydream/git_ops.py`:
- Around line 502-505: The call to _run_git inside diff_name_only can raise
GitError and must be caught to preserve the function's contract of returning []
on subprocess failure; wrap the _run_git call in a try/except that catches
GitError (the exception type raised by _run_git) and return [] from the except
block, keeping the existing behavior that also returns [] when proc.returncode
!= 0 and leaving the timeout and arguments to _run_git unchanged.
In `@daydream/training/export.py`:
- Around line 63-69: The step_id parsing in the loop over
trajectory.get("steps", []) (inside daydream/training/export.py) can raise
ValueError/TypeError when calling int(step.get("step_id", i + 1)); guard this by
wrapping the int(...) conversion in a try/except (catching ValueError and
TypeError) and on failure fall back to a safe default (e.g., use i + 1 or None)
and optionally log/debug the malformed step; update the code around the step_id
assignment so malformed step_id values do not abort export.
- Around line 164-169: The parsing of raw_labels (outcome_labels) silently
swallows JSON errors and drops data; modify the try/except around
json.loads(raw_labels) so that on JSONDecodeError/TypeError you log the error
and the offending raw_labels (and any record identifier available) before
falling back to an empty list, or re-raise if that better fits upstream
handling; update the except block to call the module logger (e.g.,
logging.exception or logger.warning with exc_info=True) referencing raw_labels
and keep the subsequent isinstance(labels, list) check to enforce type safety.
In `@tests/fixtures/training/build_archive.py`:
- Around line 27-37: The FixtureSession dataclass docstring is missing a
Google-style "Attributes:" section and the public function build_fixture_archive
lacks a "Returns:" section; update the FixtureSession docstring to include an
Attributes: block that lists session_id (str), repo_slug (str), skill (str),
grounding_rate (float), outcome_labels (tuple[str, ...]), status (str, default
"complete"), and notes (str, default ""), and update the build_fixture_archive
docstring to include a Google-style "Returns:" section describing the return
type and what the returned value represents (and add or complete Args: and
Raises: sections if applicable), ensuring wording matches existing docstring
style and uses the exact symbol names FixtureSession and build_fixture_archive
to locate the spots to edit.
In `@tests/test_archive.py`:
- Around line 85-93: Update the inline `# noqa` comments on the subprocess.run
calls in tests/test_archive.py (the git init/config/commit invocations) to
include an explicit rationale for S607 in addition to S603; e.g., augment each
`# noqa: S603, S607` comment to state that arguments are not user-controlled and
that the `git` command is a hardcoded, trusted command (same change also for the
later subprocess.run calls around lines 113-137).
In `@tests/test_training_export.py`:
- Line 16: The test suite imports the jsonschema package (seen in
test_training_export.py and test_training_record.py) but jsonschema is not
declared as a dependency; add jsonschema to the test/dev dependencies in
pyproject.toml (e.g., under [tool.poetry.dev-dependencies] or
[project.optional-dependencies."test"]) so CI installs it before running tests
and import errors are avoided.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5e0e1b36-f6a8-44c7-bd46-f8ad8718001b
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (21)
daydream/archive/git_context.pydaydream/archive/index.pydaydream/archive/manifest.pydaydream/cli.pydaydream/git_ops.pydaydream/training/__init__.pydaydream/training/exclusion.pydaydream/training/export.pydaydream/training/schema.pydaydream/training/schema/copyleft.txtdaydream/training/schema/exclusion.txtdaydream/training/schema/v1.jsontests/fixtures/training/__init__.pytests/fixtures/training/build_archive.pytests/test_archive.pytests/test_git_ops.pytests/test_training_exclusion.pytests/test_training_export.pytests/test_training_query.pytests/test_training_record.pytests/test_training_stratify.py
… deps - Fail fast on conflicting --include-all-labels + --label flags (cli.py) - Use parse_intermixed_args for feedback subcommand so optional TARGET positional is recognized after flags (cli.py) - Catch GitError in diff_name_only() to honor soft-failure contract (git_ops.py) - Warn instead of silently dropping malformed step_id and outcome_labels in export pipeline (training/export.py) - Add jsonschema to dev dependencies — used directly in tests (pyproject.toml) - Add Attributes section to FixtureSession docstring (build_archive.py) Co-Authored-By: Claude <noreply@anthropic.com>
Summary
daydream export-jsonl— a new CLI subcommand that turns archived ATIF trajectories into a versioned, schema-validated JSONL corpus suitable for training/eval pipelines.daydream/training/package: schema v1 definition, exclusion + copyleft filtering, record/span builders, deterministic stack-stratification, and an end-to-end export orchestrator.code_contextblock (base_sha,changed_files) so newly archived runs carry the git context the exporter needs end-to-end.Motivation
The trajectory archive captures per-run ATIF data, but there was no supported way to turn it into a training-ready dataset. Downstream consumers need:
This PR delivers that pipeline in one focused milestone so the training side can consume archived runs without bespoke glue.
Changes
Added
daydream/training/package:schema.py,exclusion.py,export.py, schema artifacts (schema/v1.json,schema/copyleft.txt,schema/exclusion.txt).daydream export-jsonl --out <path>CLI subcommand with the full filter, stratification, opt-in, and diagnostic flag surface from the plan (validates--max-stack-sharein(0, 1]and--min-groundingin[0, 1]).ExportConfig+run_exportorchestrator: schema-only short-circuit, filter+stratify pipeline, dry-run summary, atomic JSONL write via tempfile +Path.replace, andschema.jsonside-car emission.GitContext+Manifestcarrybase_shaandchanged_files;git_ops.diff_name_onlypopulates them at archive time;manifest.to_dict()emits acode_contextblock.archive.index.count_runs()for cheap unfiltered totals (no row materialization).tests/fixtures/training/plus suites covering exclusion, query/filter, record/span builders, stratification, and the end-to-end export.Changed
head_sha/branch/base_branchfrom the manifestgitblock (was inadvertently reading fromcode_context); falls back tomanifest_rowwhen older archives have nocode_contextblock.is_copyleft()accepts a pre-loaded list so the per-row hot loop in_query_indexdoesn't reopen the file N times._build_query()accepts anexclusionkwarg (typedfrozenset | set | None) so callers can inject the set without re-reading from disk.intto match the v1 schema integer constraint.sort_keys=Trueand compact JSON separators for deterministic, byte-identical reruns.code_contextfield order in the manifest now matches the schema definition (head_shathenbase_sha).stack=Nonewarning is now gated behindstratify_by == \"stack\".Fixed
.tmpfile beside the output._single_outcome_label(), which warns instead of silently dropping labels.test_outcomefield.Test Plan
uv run pytest— full suite passes locally (343 existing + new training/archive/git_ops coverage).uv run ruff checkanduv run mypy daydreampass; the_build_queryexclusion-param typing fix was added specifically to unblock mypy.git_ops.diff_name_onlycovered for happy path, multi-file output, empty-line filtering, and bad-ref soft-failure.Checklist
uv run pytest)uv run ruff check)uv run mypy daydream)Additional Context
Diff size: ~2,100 LOC added across 22 files; the bulk lives in
daydream/training/export.pyand its test suite. Thecode_contextrollout is forward-compatible: older archives that predate the block surfacebase_sha=None/changed_files=[], which the v1 schema explicitly allows.Generated with Claude Code