feat: OTEL metrics for latencies, usage, and connection timing by theomonnom · Pull Request #4891 · livekit/agents

theomonnom · 2026-02-18T22:59:54Z

Adds OTEL metrics export to LiveKit Cloud via OTLP.

Turn latency histograms (per-turn, keyed by `model_name` / `model_provider`)

lk.agents.turn.e2e_latency — end-to-end turn latency
lk.agents.turn.llm_ttft — LLM time to first token
lk.agents.turn.tts_ttfb — TTS time to first byte
lk.agents.turn.transcription_delay — time from end of speech to transcript
lk.agents.turn.end_of_turn_delay — time from end of speech to turn decision
lk.agents.turn.on_user_turn_completed_delay — time in on_user_turn_completed callback

Usage counters (per-event, keyed by `model_name` / `model_provider`)

lk.agents.usage.llm_input_tokens — LLM input tokens
lk.agents.usage.llm_input_cached_tokens — LLM cached input tokens
lk.agents.usage.llm_output_tokens — LLM output tokens
lk.agents.usage.llm_input_audio_tokens — LLM input audio tokens
lk.agents.usage.llm_input_text_tokens — LLM input text tokens
lk.agents.usage.llm_output_audio_tokens — LLM output audio tokens
lk.agents.usage.llm_output_text_tokens — LLM output text tokens
lk.agents.usage.llm_session_duration — LLM session duration (for session-based billing)
lk.agents.usage.tts_characters — TTS characters synthesized
lk.agents.usage.tts_audio_duration — TTS audio duration
lk.agents.usage.stt_audio_duration — STT audio duration
lk.agents.usage.interruption_num_requests — interruption detection requests

Connection histogram (per-event, keyed by `model_name` / `model_provider` / `connection_reused`)

lk.agents.connection.acquire_time — time to acquire a WebSocket connection

Other changes

Added MetricsMetadata TypedDict (model_name/model_provider) to MetricsReport for per-component model info (llm_metadata, tts_metadata, stt_metadata)
Added acquire_time and connection_reused fields to STTMetrics, TTSMetrics, RealtimeModelMetrics
ConnectionPool tracks last_acquire_time / last_connection_reused
Wired up connection timing in all STT, TTS, and realtime plugins
Compact __repr__ on all metrics types (skip default fields)
Set up OTEL MeterProvider with OTLP exporter for Cloud
Usage counters added directly per-event (no buffering)

Not included

Realtime model metadata on turn histograms (needs different naming since one model covers the whole pipeline)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instrumented directly in ConnectionPool.get() — records lk.agents.ws.connect_time with reused flag and optional model_provider/model_name attrs via metric_attrs param. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Added acquire_time and connection_reused fields to LLMMetrics, STTMetrics, TTSMetrics, and RealtimeModelMetrics for tracking WebSocket connection acquisition. Recorded as lk.agents.connection.acquire_time histogram in OTEL. Reverted ConnectionPool instrumentation — plugins will populate the fields on their metrics events instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- ConnectionPool tracks last_acquire_time/last_connection_reused - TTS plugins (cartesia, deepgram, sarvam, neuphonic, murf, asyncai, resemble, inference, elevenlabs) set timing on streams - STT plugins (openai, google, deepgram) emit connection metrics immediately via _report_connection_acquired - Realtime models (openai, google, xai, ultravox, nvidia, phonic, aws) emit connection metrics on connect via _report_connection_acquired - Base STT/TTS classes pass acquire_time/connection_reused to metrics - RealtimeSession base class has _report_connection_acquired helper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…_connection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…oss jobs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

No need for intermediate ModelUsageCollector buffer — OTEL counters are already cumulative. Add directly on each event. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MetricsReport now carries llm_metadata, tts_metadata, stt_metadata (MetricsMetadata TypedDict with model_name/model_provider) so turn latency histograms are keyed by the model that produced them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 14 additional findings in Devin Review.

devin-ai-integration · 2026-04-02T04:01:38Z

+        # Timing info from the last get() call
+        self.last_acquire_time: float = 0.0
+        self.last_connection_reused: bool = False


🟡 ConnectionPool stores timing info in shared mutable state, creating a race between concurrent callers

ConnectionPool.last_acquire_time and last_connection_reused are instance-level attributes set inside get() (under _connect_lock) but read by callers after the lock is released. In asyncio, the pattern async with pool.connection(...) as ws: self._x = pool.last_acquire_time is safe only because there's no await between yield and the read. However, this is fragile: any future refactor that adds an await between the context manager entry and the read would silently corrupt the values. A more robust design would return the timing info from get()/connection() directly, as ElevenLabs' _current_connection() does (returning a tuple).

Was this helpful? React with 👍 or 👎 to provide feedback.

_report_connection_acquired now only emits a dedicated metrics event without storing on self, so RECOGNITION_USAGE events don't re-report the same connection timing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* upstream/main: (26 commits) fix(cartesia): handle flush_done message in TTS _recv_task (livekit#5321) docs: add example agent replies to AsyncToolset (livekit#5313) add tag field to evaluation OTEL log records (livekit#5315) (phonic) Add `min_words_to_interrupt` to Phonic plugin options (livekit#5304) use delta aggregation temporality for otel metrics (livekit#5314) fix(openai realtime): support per-response tool_choice in realtime sessions (livekit#5211) add 7-day uv cooldown (livekit#5290) fix is_context_type for generic RunContext types (livekit#5307) evals: custom judges, tag metadata, and OTEL improvements (livekit#5306) feat: OTEL metrics for latencies, usage, and connection timing (livekit#4891) add session_end_timeout and gracefully cancel entrypoint on shutdown (livekit#4580) fix(core): reduce TTS output buffering latency (livekit#5292) (phonic) Update languages fields (livekit#5285) fix(cli): prevent api_key/api_secret from leaking in tracebacks (livekit#5300) fix(core): fix BackgroundAudioPlayer.play() hanging indefinitely (livekit#5299) add AsyncToolset (livekit#5127) append generate_reply instructions as system msg and convert it to user msg if unsupported (livekit#5287) fix(core): reset user state to listening when audio is disabled (livekit#5198) feat(mistralai): add ref_audio support to Voxtral TTS for zero-shot voice cloning (livekit#5278) (gemini-3.1-flash-live-preview): add warning for generate_reply (livekit#5286) ...

chenghao-mou requested a review from a team February 18, 2026 23:00

This comment was marked as resolved.

Sign in to view

theomonnom force-pushed the theo/otel-metrics branch 2 times, most recently from 4c33087 to b3901d4 Compare February 18, 2026 23:13

feat: emit OTEL metrics for turn latencies and usage counters

a70c3b7

theomonnom force-pushed the theo/otel-metrics branch from b3901d4 to a70c3b7 Compare April 2, 2026 02:48

theomonnom and others added 5 commits April 1, 2026 19:51

use ModelUsageCollector, fix observability_url, add otel_metrics export

136111c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

add on_user_turn_completed_delay histogram and interruption usage cou…

3e001f1

…nter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

unify model attr extraction, keep connection_reused on histogram

1cd1389

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

theomonnom changed the title ~~feat: emit OTEL metrics for turn latencies and usage counters~~ feat: emit OTEL metrics for turn latencies, usage counters, and connection timing Apr 2, 2026

theomonnom and others added 3 commits April 1, 2026 20:28

fix elevenlabs connection_reused detection

854453d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

clean up elevenlabs: track acquire_time/connection_reused in _current…

a91fe83

…_connection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

theomonnom changed the title ~~feat: emit OTEL metrics for turn latencies, usage counters, and connection timing~~ feat: OTEL metrics for latencies, usage, and connection timing Apr 2, 2026

theomonnom and others added 4 commits April 1, 2026 20:34

fix: reset usage collector after flush to prevent double-counting acr…

76d5ddb

…oss jobs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

remove flush_usage, add counters directly in collect_usage

d49dd25

No need for intermediate ModelUsageCollector buffer — OTEL counters are already cumulative. Add directly on each event. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

remove getattr, use ev.metadata directly in each isinstance branch

e16a7cd

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

compact repr for metrics types: skip default fields

19198fe

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This comment was marked as resolved.

Sign in to view

theomonnom and others added 4 commits April 1, 2026 20:46

rename interruption counter to match field name: num_requests

76c7b9b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

add llm_metadata to all realtime model MetricsReport paths

c2b2ffc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

revert realtime model metadata — needs different naming, skip for now

49ac0ac

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

devin-ai-integration Bot reviewed Apr 2, 2026

View reviewed changes

theomonnom merged commit 376e908 into main Apr 2, 2026
18 of 21 checks passed

theomonnom deleted the theo/otel-metrics branch April 2, 2026 04:04

russellmartin-livekit pushed a commit that referenced this pull request Apr 13, 2026

feat: OTEL metrics for latencies, usage, and connection timing (#4891)

659eb36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: OTEL metrics for latencies, usage, and connection timing#4891

feat: OTEL metrics for latencies, usage, and connection timing#4891
theomonnom merged 18 commits intomainfrom
theo/otel-metrics

theomonnom commented Feb 18, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

theomonnom commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Turn latency histograms (per-turn, keyed by model_name / model_provider)

Usage counters (per-event, keyed by model_name / model_provider)

Connection histogram (per-event, keyed by model_name / model_provider / connection_reused)

Other changes

Not included

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

theomonnom commented Feb 18, 2026 •

edited

Loading

Turn latency histograms (per-turn, keyed by `model_name` / `model_provider`)

Usage counters (per-event, keyed by `model_name` / `model_provider`)

Connection histogram (per-event, keyed by `model_name` / `model_provider` / `connection_reused`)