feat(core): add Eval() API for single-file TypeScript evaluations#548
Draft
feat(core): add Eval() API for single-file TypeScript evaluations#548
Conversation
Detect .ts/.js/.mts/.mjs files and route them through the Eval() API instead of the YAML pipeline. Imports the file, discovers registered evals, awaits their promises, and collects results for summary output.
…tion Extend resolveEvalPaths to recognize .ts, .js, .mts, and .mjs extensions alongside YAML and JSONL, enabling `agentv eval path/to/eval.ts`.
Address biome lint issues: import ordering, noExplicitAny suppressions, non-null assertions replaced with optional chaining, proper JsonObject typing, and formatting fixes.
Demonstrates single-file eval with mock target, built-in Contains() assertion, and inline assertion function.
Deploying agentv with
|
| Latest commit: |
aba42b3
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://8a998574.agentv.pages.dev |
| Branch Preview URL: | https://feat-eval-api.agentv.pages.dev |
- Remove dead code: legacy setInlineAssertFns/getInlineAssertFns and the fallback path in builtin-evaluators registry - Remove evaluate() from public exports, migrate example to Eval() - Add InlineAssertEvaluatorConfig to type system, remove unsafe casts - Replace fake __eval_api__.yaml path with <eval-api> virtual marker - Deduplicate computeSummary() — export from evaluate.ts, import in eval-api.ts - Fix mixed TS+YAML output: write TS results to outputWriter immediately - Add ExactMatch usage to sdk-eval-api example - Add borderline score test case (0.5 <= score < 0.8)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new
Eval()API that enables writing evaluations as single TypeScript files, improving DX for programmatic evaluation authoring.Closes #537
What's included
Eval()function and registry — Declare and run evaluations from a single.tsfile. Supports CLI discovery (module-level registration) and programmatic use (awaitfor results)Contains,IContains,ContainsAll,ContainsAny,ExactMatch,StartsWith,EndsWith,Regex,IsJsonfor concise inline assertionstaskoption wraps a plain(input) => stringfunction as a Provider, enabling eval without an external agent targetAssertFnfunctions in-process (no subprocess or stdin/stdout)agentv evalnow accepts.tsand.jsfiles in addition to YAML, resolving them through Bun's native TS loaderFiles changed (13 files, +981 / -13)
eval-api.ts,assertions.ts,function-provider.ts,inline-assert.tsbuiltin-evaluators.ts(register inline-assert)packages/core/src/index.tsrun-eval.ts,shared.ts(TS file resolution)examples/features/sdk-eval-api/evals/basic.eval.tsTest plan
bun run typecheckpassesbun run lintpassesbun test packages/core/test/evaluation/— 928 tests pass, 0 fail