feat: OTEL metrics for latencies, usage, and connection timing#4891
feat: OTEL metrics for latencies, usage, and connection timing#4891theomonnom merged 18 commits intomainfrom
Conversation
4c33087 to
b3901d4
Compare
b3901d4 to
a70c3b7
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instrumented directly in ConnectionPool.get() — records lk.agents.ws.connect_time with reused flag and optional model_provider/model_name attrs via metric_attrs param. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added acquire_time and connection_reused fields to LLMMetrics, STTMetrics, TTSMetrics, and RealtimeModelMetrics for tracking WebSocket connection acquisition. Recorded as lk.agents.connection.acquire_time histogram in OTEL. Reverted ConnectionPool instrumentation — plugins will populate the fields on their metrics events instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ConnectionPool tracks last_acquire_time/last_connection_reused - TTS plugins (cartesia, deepgram, sarvam, neuphonic, murf, asyncai, resemble, inference, elevenlabs) set timing on streams - STT plugins (openai, google, deepgram) emit connection metrics immediately via _report_connection_acquired - Realtime models (openai, google, xai, ultravox, nvidia, phonic, aws) emit connection metrics on connect via _report_connection_acquired - Base STT/TTS classes pass acquire_time/connection_reused to metrics - RealtimeSession base class has _report_connection_acquired helper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…_connection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…oss jobs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No need for intermediate ModelUsageCollector buffer — OTEL counters are already cumulative. Add directly on each event. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MetricsReport now carries llm_metadata, tts_metadata, stt_metadata (MetricsMetadata TypedDict with model_name/model_provider) so turn latency histograms are keyed by the model that produced them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| # Timing info from the last get() call | ||
| self.last_acquire_time: float = 0.0 | ||
| self.last_connection_reused: bool = False |
There was a problem hiding this comment.
🟡 ConnectionPool stores timing info in shared mutable state, creating a race between concurrent callers
ConnectionPool.last_acquire_time and last_connection_reused are instance-level attributes set inside get() (under _connect_lock) but read by callers after the lock is released. In asyncio, the pattern async with pool.connection(...) as ws: self._x = pool.last_acquire_time is safe only because there's no await between yield and the read. However, this is fragile: any future refactor that adds an await between the context manager entry and the read would silently corrupt the values. A more robust design would return the timing info from get()/connection() directly, as ElevenLabs' _current_connection() does (returning a tuple).
Was this helpful? React with 👍 or 👎 to provide feedback.
_report_connection_acquired now only emits a dedicated metrics event without storing on self, so RECOGNITION_USAGE events don't re-report the same connection timing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* upstream/main: (26 commits) fix(cartesia): handle flush_done message in TTS _recv_task (livekit#5321) docs: add example agent replies to AsyncToolset (livekit#5313) add tag field to evaluation OTEL log records (livekit#5315) (phonic) Add `min_words_to_interrupt` to Phonic plugin options (livekit#5304) use delta aggregation temporality for otel metrics (livekit#5314) fix(openai realtime): support per-response tool_choice in realtime sessions (livekit#5211) add 7-day uv cooldown (livekit#5290) fix is_context_type for generic RunContext types (livekit#5307) evals: custom judges, tag metadata, and OTEL improvements (livekit#5306) feat: OTEL metrics for latencies, usage, and connection timing (livekit#4891) add session_end_timeout and gracefully cancel entrypoint on shutdown (livekit#4580) fix(core): reduce TTS output buffering latency (livekit#5292) (phonic) Update languages fields (livekit#5285) fix(cli): prevent api_key/api_secret from leaking in tracebacks (livekit#5300) fix(core): fix BackgroundAudioPlayer.play() hanging indefinitely (livekit#5299) add AsyncToolset (livekit#5127) append generate_reply instructions as system msg and convert it to user msg if unsupported (livekit#5287) fix(core): reset user state to listening when audio is disabled (livekit#5198) feat(mistralai): add ref_audio support to Voxtral TTS for zero-shot voice cloning (livekit#5278) (gemini-3.1-flash-live-preview): add warning for generate_reply (livekit#5286) ...
Adds OTEL metrics export to LiveKit Cloud via OTLP.
Turn latency histograms (per-turn, keyed by
model_name/model_provider)lk.agents.turn.e2e_latency— end-to-end turn latencylk.agents.turn.llm_ttft— LLM time to first tokenlk.agents.turn.tts_ttfb— TTS time to first bytelk.agents.turn.transcription_delay— time from end of speech to transcriptlk.agents.turn.end_of_turn_delay— time from end of speech to turn decisionlk.agents.turn.on_user_turn_completed_delay— time in on_user_turn_completed callbackUsage counters (per-event, keyed by
model_name/model_provider)lk.agents.usage.llm_input_tokens— LLM input tokenslk.agents.usage.llm_input_cached_tokens— LLM cached input tokenslk.agents.usage.llm_output_tokens— LLM output tokenslk.agents.usage.llm_input_audio_tokens— LLM input audio tokenslk.agents.usage.llm_input_text_tokens— LLM input text tokenslk.agents.usage.llm_output_audio_tokens— LLM output audio tokenslk.agents.usage.llm_output_text_tokens— LLM output text tokenslk.agents.usage.llm_session_duration— LLM session duration (for session-based billing)lk.agents.usage.tts_characters— TTS characters synthesizedlk.agents.usage.tts_audio_duration— TTS audio durationlk.agents.usage.stt_audio_duration— STT audio durationlk.agents.usage.interruption_num_requests— interruption detection requestsConnection histogram (per-event, keyed by
model_name/model_provider/connection_reused)lk.agents.connection.acquire_time— time to acquire a WebSocket connectionOther changes
MetricsMetadataTypedDict (model_name/model_provider) toMetricsReportfor per-component model info (llm_metadata,tts_metadata,stt_metadata)acquire_timeandconnection_reusedfields toSTTMetrics,TTSMetrics,RealtimeModelMetricsConnectionPooltrackslast_acquire_time/last_connection_reused__repr__on all metrics types (skip default fields)MeterProviderwith OTLP exporter for CloudNot included