feat: add optional OpenTelemetry tracing support#18
Merged
Conversation
…cessing, and async metrics - Install @opentelemetry/api as optional peer dependency for zero overhead when disabled - Implement telemetry module with dynamic import and no-op fallback pattern - Add span instrumentation across batch evaluation, single evaluations, and async metrics - Support custom evaluator instrumentation via public withSpan API - Comprehensive span hierarchy: batch.evaluate -> process_row -> run_evaluators -> evaluator.evaluate - Retry events recorded on process_row spans with attempt number and delay tracking - Token usage tracking across evaluations (input/output/total tokens) - All OpenTelemetry integration is optional - zero performance impact when package not installed - Add Telemetry documentation with setup instructions and production examples Refactored for code health: - Extracted span lifecycle management from evaluate() method into helper methods - Simplified retry logic in batch processing with context object pattern - All files maintain 10/10 code health scores with no regressions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7694302 to
2a5a4ac
Compare
Extract shared helpers (runBatch, getSpan) to reduce duplication and eliminate optional chaining in assertions, bringing score to 10/10. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix biome formatting in telemetry spec files - Remove unused private startTime field in ProgressTracker Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add enableTelemetry(enabled) and isTelemetryEnabled() to control eval-kit tracing globally. When disabled, all spans are suppressed including Vercel AI SDK's experimental_telemetry. This lets users run OTel for other services without eval-kit cluttering their traces. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
matt-koevort
approved these changes
Mar 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
@opentelemetry/apipeer dependency for distributed tracing with zero overhead when disabledwithSpanAPI for custom evaluator instrumentationTest plan
🤖 Generated with Claude Code