Skip to content

feat(core): add Eval() API for single-file TypeScript evaluations#548

Draft
christso wants to merge 10 commits intomainfrom
feat/eval-api
Draft

feat(core): add Eval() API for single-file TypeScript evaluations#548
christso wants to merge 10 commits intomainfrom
feat/eval-api

Conversation

@christso
Copy link
Collaborator

Summary

Adds a new Eval() API that enables writing evaluations as single TypeScript files, improving DX for programmatic evaluation authoring.

Closes #537

What's included

  • Eval() function and registry — Declare and run evaluations from a single .ts file. Supports CLI discovery (module-level registration) and programmatic use (await for results)
  • Built-in assertion factoriesContains, IContains, ContainsAll, ContainsAny, ExactMatch, StartsWith, EndsWith, Regex, IsJson for concise inline assertions
  • Function providertask option wraps a plain (input) => string function as a Provider, enabling eval without an external agent target
  • Inline-assert evaluator — New evaluator type that runs AssertFn functions in-process (no subprocess or stdin/stdout)
  • CLI TypeScript supportagentv eval now accepts .ts and .js files in addition to YAML, resolving them through Bun's native TS loader

Files changed (13 files, +981 / -13)

Area Files
Core API eval-api.ts, assertions.ts, function-provider.ts, inline-assert.ts
Registry builtin-evaluators.ts (register inline-assert)
Exports packages/core/src/index.ts
CLI run-eval.ts, shared.ts (TS file resolution)
Example examples/features/sdk-eval-api/evals/basic.eval.ts
Tests 4 test files covering assertions, eval-api, inline-assert, function-provider

Test plan

Detect .ts/.js/.mts/.mjs files and route them through the Eval() API
instead of the YAML pipeline. Imports the file, discovers registered
evals, awaits their promises, and collects results for summary output.
…tion

Extend resolveEvalPaths to recognize .ts, .js, .mts, and .mjs extensions
alongside YAML and JSONL, enabling `agentv eval path/to/eval.ts`.
Address biome lint issues: import ordering, noExplicitAny suppressions,
non-null assertions replaced with optional chaining, proper JsonObject
typing, and formatting fixes.
Demonstrates single-file eval with mock target, built-in Contains()
assertion, and inline assertion function.
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Mar 13, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: aba42b3
Status: ✅  Deploy successful!
Preview URL: https://8a998574.agentv.pages.dev
Branch Preview URL: https://feat-eval-api.agentv.pages.dev

View logs

- Remove dead code: legacy setInlineAssertFns/getInlineAssertFns and
  the fallback path in builtin-evaluators registry
- Remove evaluate() from public exports, migrate example to Eval()
- Add InlineAssertEvaluatorConfig to type system, remove unsafe casts
- Replace fake __eval_api__.yaml path with <eval-api> virtual marker
- Deduplicate computeSummary() — export from evaluate.ts, import in eval-api.ts
- Fix mixed TS+YAML output: write TS results to outputWriter immediately
- Add ExactMatch usage to sdk-eval-api example
- Add borderline score test case (0.5 <= score < 0.8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor TypeScript SDK for better DX

1 participant