refactor: consolidate input_segments into input (TestMessage[])

## Objective

Remove the `input_segments` field from `EvalTest` by resolving file content directly into `input` (TestMessage[]) at parse time. Today `input` and `input_segments` carry overlapping data — `input_segments` is a flattened, role-stripped version of `input` with resolved file text. This duplication caused a bug where `{{ input }}` in judge templates showed file paths without content (fixed in PR #507 via a render-time patch). Consolidating eliminates the duplication and the patch.

## Architecture Boundary

`core-runtime` — touches the EvalTest type contract, YAML/JSONL parsers, prompt-builder, and evaluator variable binding. All changes are within `packages/core/src/evaluation/` and `apps/cli/src/commands/trace/`.

## Current State

| Concern | Where it reads from today |
|---|---|
| Judge template \`{{ input }}\` | `evalCase.input` + render-time `resolveInputWithFileContent()` patch (llm-judge.ts) |
| Multi-turn prompt building | `evalCase.input` for messages, `evalCase.input_segments` for file content lookup |
| Single-turn prompt building | `evalCase.input_segments` directly (prompt-builder.ts:165) |
| Trace scoring file map | `evalCase.input_segments` for `resolvedPath` (trace/score.ts) |

## Design Latitude

- **Must**: enrich `input` message content segments with `text` (file content) and `resolvedPath` during `processMessages()` in yaml-parser/jsonl-parser
- **Must**: update prompt-builder single-turn path to derive segments from `input` instead of `input_segments`
- **Must**: update trace scoring to extract file paths from `input`
- **Must**: remove `input_segments` from `EvalTest` interface, all parsers, and test fixtures (~18 files)
- **Must**: remove `resolveInputWithFileContent()` from llm-judge.ts (the patch becomes unnecessary)
- **Free to decide**: how to structure the flattening helper for single-turn prompt-builder (inline loop vs extracted utility)
- **Free to decide**: whether `resolvedPath` lives on the content segment or is derived at point of use

## Acceptance Signals

- [ ] `input_segments` does not appear anywhere in `packages/core/src/` or `apps/cli/src/`
- [ ] `resolveInputWithFileContent` is removed from llm-judge.ts
- [ ] `bun run build` succeeds
- [ ] `bun test packages/core/test/evaluation/` — all tests pass
- [ ] Multi-turn e2e: `agentv eval examples/features/multi-turn-conversation/evals/dataset.eval.yaml --target default` — 2/2 pass with per-turn scoring and role-annotated context
- [ ] Judge templates with `{{ input }}` show resolved file content (not just paths)

## Non-Goals

- Changing the YAML/JSONL user-facing schema (input format stays the same)
- Modifying the `TestMessage` type definition (content segments are already `JsonObject`)
- Changing how guidelines are extracted/filtered (that stays in `processMessages`)

## Related

- PR #507 (feature/multi-turn-conversation-eval) — introduced the render-time patch this issue removes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: consolidate input_segments into input (TestMessage[]) #511

Objective

Architecture Boundary

Current State

Design Latitude

Acceptance Signals

Non-Goals

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Concern	Where it reads from today
Judge template `{{ input }}`	`evalCase.input` + render-time `resolveInputWithFileContent()` patch (llm-judge.ts)
Multi-turn prompt building	`evalCase.input` for messages, `evalCase.input_segments` for file content lookup
Single-turn prompt building	`evalCase.input_segments` directly (prompt-builder.ts:165)
Trace scoring file map	`evalCase.input_segments` for `resolvedPath` (trace/score.ts)

refactor: consolidate input_segments into input (TestMessage[]) #511

Description

Objective

Architecture Boundary

Current State

Design Latitude

Acceptance Signals

Non-Goals

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions