Skip to content

refactor: consolidate input_segments into input (TestMessage[]) #511

@christso

Description

@christso

Objective

Remove the input_segments field from EvalTest by resolving file content directly into input (TestMessage[]) at parse time. Today input and input_segments carry overlapping data — input_segments is a flattened, role-stripped version of input with resolved file text. This duplication caused a bug where {{ input }} in judge templates showed file paths without content (fixed in PR #507 via a render-time patch). Consolidating eliminates the duplication and the patch.

Architecture Boundary

core-runtime — touches the EvalTest type contract, YAML/JSONL parsers, prompt-builder, and evaluator variable binding. All changes are within packages/core/src/evaluation/ and apps/cli/src/commands/trace/.

Current State

Concern Where it reads from today
Judge template `{{ input }}` evalCase.input + render-time resolveInputWithFileContent() patch (llm-judge.ts)
Multi-turn prompt building evalCase.input for messages, evalCase.input_segments for file content lookup
Single-turn prompt building evalCase.input_segments directly (prompt-builder.ts:165)
Trace scoring file map evalCase.input_segments for resolvedPath (trace/score.ts)

Design Latitude

  • Must: enrich input message content segments with text (file content) and resolvedPath during processMessages() in yaml-parser/jsonl-parser
  • Must: update prompt-builder single-turn path to derive segments from input instead of input_segments
  • Must: update trace scoring to extract file paths from input
  • Must: remove input_segments from EvalTest interface, all parsers, and test fixtures (~18 files)
  • Must: remove resolveInputWithFileContent() from llm-judge.ts (the patch becomes unnecessary)
  • Free to decide: how to structure the flattening helper for single-turn prompt-builder (inline loop vs extracted utility)
  • Free to decide: whether resolvedPath lives on the content segment or is derived at point of use

Acceptance Signals

  • input_segments does not appear anywhere in packages/core/src/ or apps/cli/src/
  • resolveInputWithFileContent is removed from llm-judge.ts
  • bun run build succeeds
  • bun test packages/core/test/evaluation/ — all tests pass
  • Multi-turn e2e: agentv eval examples/features/multi-turn-conversation/evals/dataset.eval.yaml --target default — 2/2 pass with per-turn scoring and role-annotated context
  • Judge templates with {{ input }} show resolved file content (not just paths)

Non-Goals

  • Changing the YAML/JSONL user-facing schema (input format stays the same)
  • Modifying the TestMessage type definition (content segments are already JsonObject)
  • Changing how guidelines are extracted/filtered (that stays in processMessages)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions