-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Objective
Remove the input_segments field from EvalTest by resolving file content directly into input (TestMessage[]) at parse time. Today input and input_segments carry overlapping data — input_segments is a flattened, role-stripped version of input with resolved file text. This duplication caused a bug where {{ input }} in judge templates showed file paths without content (fixed in PR #507 via a render-time patch). Consolidating eliminates the duplication and the patch.
Architecture Boundary
core-runtime — touches the EvalTest type contract, YAML/JSONL parsers, prompt-builder, and evaluator variable binding. All changes are within packages/core/src/evaluation/ and apps/cli/src/commands/trace/.
Current State
| Concern | Where it reads from today |
|---|---|
| Judge template `{{ input }}` | evalCase.input + render-time resolveInputWithFileContent() patch (llm-judge.ts) |
| Multi-turn prompt building | evalCase.input for messages, evalCase.input_segments for file content lookup |
| Single-turn prompt building | evalCase.input_segments directly (prompt-builder.ts:165) |
| Trace scoring file map | evalCase.input_segments for resolvedPath (trace/score.ts) |
Design Latitude
- Must: enrich
inputmessage content segments withtext(file content) andresolvedPathduringprocessMessages()in yaml-parser/jsonl-parser - Must: update prompt-builder single-turn path to derive segments from
inputinstead ofinput_segments - Must: update trace scoring to extract file paths from
input - Must: remove
input_segmentsfromEvalTestinterface, all parsers, and test fixtures (~18 files) - Must: remove
resolveInputWithFileContent()from llm-judge.ts (the patch becomes unnecessary) - Free to decide: how to structure the flattening helper for single-turn prompt-builder (inline loop vs extracted utility)
- Free to decide: whether
resolvedPathlives on the content segment or is derived at point of use
Acceptance Signals
-
input_segmentsdoes not appear anywhere inpackages/core/src/orapps/cli/src/ -
resolveInputWithFileContentis removed from llm-judge.ts -
bun run buildsucceeds -
bun test packages/core/test/evaluation/— all tests pass - Multi-turn e2e:
agentv eval examples/features/multi-turn-conversation/evals/dataset.eval.yaml --target default— 2/2 pass with per-turn scoring and role-annotated context - Judge templates with
{{ input }}show resolved file content (not just paths)
Non-Goals
- Changing the YAML/JSONL user-facing schema (input format stays the same)
- Modifying the
TestMessagetype definition (content segments are alreadyJsonObject) - Changing how guidelines are extracted/filtered (that stays in
processMessages)
Related
- PR feat: add multi-turn conversation eval example with details field #507 (feature/multi-turn-conversation-eval) — introduced the render-time patch this issue removes
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels