Baseten Plugin Update: fix metadata schema, add chain_id support, and improve response parsing#4889
Merged
davidzhao merged 12 commits intolivekit:mainfrom Mar 29, 2026
Merged
Conversation
d6e828c to
8e1c5a2
Compare
26364d0 to
a281e84
Compare
Contributor
Author
|
Ready for review. |
Contributor
Author
|
recheck |
Contributor
Author
3f831e4 to
c0e59c0
Compare
tinalenguyen
approved these changes
Mar 24, 2026
Member
tinalenguyen
left a comment
There was a problem hiding this comment.
tested it and left a few minor comments, otherwise LGTM, thank you for the PR!
osimhi213
added a commit
to de-id/livekit-agents
that referenced
this pull request
Mar 29, 2026
…endor * upstream/main: Baseten Plugin Update: fix metadata schema, add chain_id support, and improve response parsing (livekit#4889) feat(anthropic): support strict tool use schema (livekit#5259) feat(mistral): add voxtral TTS support (livekit#5245) Fix/stt fallback adapter propagate aligned transcript (livekit#5237) fix: expose endpointing_opts in AgentSession.update_options() (livekit#5243) fix(utils): improve type annotation for deprecate_params decorator (livekit#5244) (google realtime): add gemini-3.1-flash-live-preview model (livekit#5233) fix: Nova Sonic interactive context bugs and dynamic tool support (livekit#5220) feat(assemblyai): add domain parameter for Medical Mode (livekit#5208) fix: ensure MCP client enter/exit run in the same task (livekit#5223) feat(google): add VertexRAGRetrieval provider tool (livekit#5222)
osimhi213
added a commit
to de-id/livekit-agents
that referenced
this pull request
Mar 29, 2026
* upstream/main: Baseten Plugin Update: fix metadata schema, add chain_id support, and improve response parsing (livekit#4889) feat(anthropic): support strict tool use schema (livekit#5259) feat(mistral): add voxtral TTS support (livekit#5245) Fix/stt fallback adapter propagate aligned transcript (livekit#5237) fix: expose endpointing_opts in AgentSession.update_options() (livekit#5243) fix(utils): improve type annotation for deprecate_params decorator (livekit#5244) (google realtime): add gemini-3.1-flash-live-preview model (livekit#5233) fix: Nova Sonic interactive context bugs and dynamic tool support (livekit#5220) feat(assemblyai): add domain parameter for Medical Mode (livekit#5208) fix: ensure MCP client enter/exit run in the same task (livekit#5223) feat(google): add VertexRAGRetrieval provider tool (livekit#5222)
osimhi213
added a commit
to de-id/livekit-agents
that referenced
this pull request
Mar 29, 2026
* feat(google): add VertexRAGRetrieval provider tool (livekit#5222) * fix: ensure MCP client enter/exit run in the same task (livekit#5223) * feat(assemblyai): add domain parameter for Medical Mode (livekit#5208) * fix: Nova Sonic interactive context bugs and dynamic tool support (livekit#5220) Co-authored-by: Pavas Kant <pavkan@amazon.com> * (google realtime): add gemini-3.1-flash-live-preview model (livekit#5233) * fix(utils): improve type annotation for deprecate_params decorator (livekit#5244) * fix: expose endpointing_opts in AgentSession.update_options() (livekit#5243) * Fix/stt fallback adapter propagate aligned transcript (livekit#5237) * feat(mistral): add voxtral TTS support (livekit#5245) * feat(anthropic): support strict tool use schema (livekit#5259) * Baseten Plugin Update: fix metadata schema, add chain_id support, and improve response parsing (livekit#4889) --------- Co-authored-by: Yousuf Bukhari <25112850+youpesh@users.noreply.github.com> Co-authored-by: Long Chen <longch1024@gmail.com> Co-authored-by: Martin Schweiger <34636718+m-ods@users.noreply.github.com> Co-authored-by: Osman-AGI <uyguripek@gmail.com> Co-authored-by: Pavas Kant <pavkan@amazon.com> Co-authored-by: Tina Nguyen <72938484+tinalenguyen@users.noreply.github.com> Co-authored-by: Milad <129620931+miladmnasr@users.noreply.github.com> Co-authored-by: Jean Perbet <jeanperbet@icloud.com> Co-authored-by: Shaik Faizan Roshan Ali <roshan.shaik.ml@gmail.com> Co-authored-by: jiegong-fde <jie.gong@baseten.co>
russellmartin-livekit
pushed a commit
that referenced
this pull request
Apr 13, 2026
… improve response parsing (#4889)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Updates the Baseten STT plugin to align with Baseten's current Streaming Transcription API, adds ergonomic endpoint configuration via
model_id/chain_idparameters, and significantly expands the README documentation.Motivation: The existing plugin had several issues that prevented it from working correctly with Baseten's streaming ASR API:
Wrong metadata field names — The plugin sent vad_params and streaming_whisper_params in the WebSocket metadata, but Baseten's StreamingWhisperInput schema (which uses extra="forbid") expects whisper_params, streaming_params, streaming_vad_config, and streaming_diarization_config. This caused the connection to be rejected outright.
No chain deployment support — Users had to manually construct WebSocket URLs. There was no way to specify a chain ID, which is the recommended deployment type for Baseten's streaming ASR.
Missing streaming parameters — Options like enable_partial_transcripts, partial_transcript_interval_s, show_word_timestamps, and final_transcript_max_duration_s were not exposed, limiting configurability.
Incomplete response parsing — The plugin didn't handle chain responses (which lack a top-level transcript field and include a "type": "transcription" wrapper), nor did it extract word-level timestamps from word_timestamps within segments.