Skip to content

feat: OTEL metrics for latencies, usage, and connection timing#4891

Merged
theomonnom merged 18 commits intomainfrom
theo/otel-metrics
Apr 2, 2026
Merged

feat: OTEL metrics for latencies, usage, and connection timing#4891
theomonnom merged 18 commits intomainfrom
theo/otel-metrics

Conversation

@theomonnom
Copy link
Copy Markdown
Member

@theomonnom theomonnom commented Feb 18, 2026

Adds OTEL metrics export to LiveKit Cloud via OTLP.

Turn latency histograms (per-turn, keyed by model_name / model_provider)

  • lk.agents.turn.e2e_latency — end-to-end turn latency
  • lk.agents.turn.llm_ttft — LLM time to first token
  • lk.agents.turn.tts_ttfb — TTS time to first byte
  • lk.agents.turn.transcription_delay — time from end of speech to transcript
  • lk.agents.turn.end_of_turn_delay — time from end of speech to turn decision
  • lk.agents.turn.on_user_turn_completed_delay — time in on_user_turn_completed callback

Usage counters (per-event, keyed by model_name / model_provider)

  • lk.agents.usage.llm_input_tokens — LLM input tokens
  • lk.agents.usage.llm_input_cached_tokens — LLM cached input tokens
  • lk.agents.usage.llm_output_tokens — LLM output tokens
  • lk.agents.usage.llm_input_audio_tokens — LLM input audio tokens
  • lk.agents.usage.llm_input_text_tokens — LLM input text tokens
  • lk.agents.usage.llm_output_audio_tokens — LLM output audio tokens
  • lk.agents.usage.llm_output_text_tokens — LLM output text tokens
  • lk.agents.usage.llm_session_duration — LLM session duration (for session-based billing)
  • lk.agents.usage.tts_characters — TTS characters synthesized
  • lk.agents.usage.tts_audio_duration — TTS audio duration
  • lk.agents.usage.stt_audio_duration — STT audio duration
  • lk.agents.usage.interruption_num_requests — interruption detection requests

Connection histogram (per-event, keyed by model_name / model_provider / connection_reused)

  • lk.agents.connection.acquire_time — time to acquire a WebSocket connection

Other changes

  • Added MetricsMetadata TypedDict (model_name/model_provider) to MetricsReport for per-component model info (llm_metadata, tts_metadata, stt_metadata)
  • Added acquire_time and connection_reused fields to STTMetrics, TTSMetrics, RealtimeModelMetrics
  • ConnectionPool tracks last_acquire_time / last_connection_reused
  • Wired up connection timing in all STT, TTS, and realtime plugins
  • Compact __repr__ on all metrics types (skip default fields)
  • Set up OTEL MeterProvider with OTLP exporter for Cloud
  • Usage counters added directly per-event (no buffering)

Not included

  • Realtime model metadata on turn histograms (needs different naming since one model covers the whole pipeline)

@chenghao-mou chenghao-mou requested a review from a team February 18, 2026 23:00
devin-ai-integration[bot]

This comment was marked as resolved.

@theomonnom theomonnom force-pushed the theo/otel-metrics branch 2 times, most recently from 4c33087 to b3901d4 Compare February 18, 2026 23:13
theomonnom and others added 5 commits April 1, 2026 19:51
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instrumented directly in ConnectionPool.get() — records
lk.agents.ws.connect_time with reused flag and optional
model_provider/model_name attrs via metric_attrs param.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added acquire_time and connection_reused fields to LLMMetrics,
STTMetrics, TTSMetrics, and RealtimeModelMetrics for tracking
WebSocket connection acquisition. Recorded as
lk.agents.connection.acquire_time histogram in OTEL.

Reverted ConnectionPool instrumentation — plugins will populate
the fields on their metrics events instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@theomonnom theomonnom changed the title feat: emit OTEL metrics for turn latencies and usage counters feat: emit OTEL metrics for turn latencies, usage counters, and connection timing Apr 2, 2026
theomonnom and others added 3 commits April 1, 2026 20:28
- ConnectionPool tracks last_acquire_time/last_connection_reused
- TTS plugins (cartesia, deepgram, sarvam, neuphonic, murf, asyncai,
  resemble, inference, elevenlabs) set timing on streams
- STT plugins (openai, google, deepgram) emit connection metrics
  immediately via _report_connection_acquired
- Realtime models (openai, google, xai, ultravox, nvidia, phonic, aws)
  emit connection metrics on connect via _report_connection_acquired
- Base STT/TTS classes pass acquire_time/connection_reused to metrics
- RealtimeSession base class has _report_connection_acquired helper

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…_connection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@theomonnom theomonnom changed the title feat: emit OTEL metrics for turn latencies, usage counters, and connection timing feat: OTEL metrics for latencies, usage, and connection timing Apr 2, 2026
theomonnom and others added 4 commits April 1, 2026 20:34
…oss jobs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No need for intermediate ModelUsageCollector buffer — OTEL
counters are already cumulative. Add directly on each event.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

theomonnom and others added 4 commits April 1, 2026 20:46
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MetricsReport now carries llm_metadata, tts_metadata, stt_metadata
(MetricsMetadata TypedDict with model_name/model_provider) so turn
latency histograms are keyed by the model that produced them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 14 additional findings in Devin Review.

Open in Devin Review

Comment on lines +52 to +54
# Timing info from the last get() call
self.last_acquire_time: float = 0.0
self.last_connection_reused: bool = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 ConnectionPool stores timing info in shared mutable state, creating a race between concurrent callers

ConnectionPool.last_acquire_time and last_connection_reused are instance-level attributes set inside get() (under _connect_lock) but read by callers after the lock is released. In asyncio, the pattern async with pool.connection(...) as ws: self._x = pool.last_acquire_time is safe only because there's no await between yield and the read. However, this is fragile: any future refactor that adds an await between the context manager entry and the read would silently corrupt the values. A more robust design would return the timing info from get()/connection() directly, as ElevenLabs' _current_connection() does (returning a tuple).

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

_report_connection_acquired now only emits a dedicated metrics event
without storing on self, so RECOGNITION_USAGE events don't re-report
the same connection timing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@theomonnom theomonnom merged commit 376e908 into main Apr 2, 2026
18 of 21 checks passed
@theomonnom theomonnom deleted the theo/otel-metrics branch April 2, 2026 04:04
osimhi213 added a commit to de-id/livekit-agents that referenced this pull request Apr 3, 2026
* upstream/main: (26 commits)
  fix(cartesia): handle flush_done message in TTS _recv_task (livekit#5321)
  docs: add example agent replies to AsyncToolset (livekit#5313)
  add tag field to evaluation OTEL log records (livekit#5315)
  (phonic) Add `min_words_to_interrupt` to Phonic plugin options (livekit#5304)
  use delta aggregation temporality for otel metrics (livekit#5314)
  fix(openai realtime): support per-response tool_choice in realtime sessions (livekit#5211)
  add 7-day uv cooldown (livekit#5290)
  fix is_context_type for generic RunContext types (livekit#5307)
  evals: custom judges, tag metadata, and OTEL improvements (livekit#5306)
  feat: OTEL metrics for latencies, usage, and connection timing (livekit#4891)
  add session_end_timeout and gracefully cancel entrypoint on shutdown (livekit#4580)
  fix(core): reduce TTS output buffering latency (livekit#5292)
  (phonic) Update languages fields (livekit#5285)
  fix(cli): prevent api_key/api_secret from leaking in tracebacks (livekit#5300)
  fix(core): fix BackgroundAudioPlayer.play() hanging indefinitely (livekit#5299)
  add AsyncToolset (livekit#5127)
  append generate_reply instructions as system msg and convert it to user msg if unsupported (livekit#5287)
  fix(core): reset user state to listening when audio is disabled (livekit#5198)
  feat(mistralai): add ref_audio support to Voxtral TTS for zero-shot voice cloning (livekit#5278)
  (gemini-3.1-flash-live-preview): add warning for generate_reply (livekit#5286)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant