Skip to content

Fix Gemini Live local VAD by sending correct activity events to server#4146

Merged
markbackman merged 2 commits intomainfrom
mb/gemini-live-local-vad
Mar 26, 2026
Merged

Fix Gemini Live local VAD by sending correct activity events to server#4146
markbackman merged 2 commits intomainfrom
mb/gemini-live-local-vad

Conversation

@markbackman
Copy link
Copy Markdown
Contributor

@markbackman markbackman commented Mar 25, 2026

Summary

  • Fixed Gemini Live local VAD mode (GeminiVADParams(disabled=True)) not working. The service now correctly detects user speech via VAD frames and sends ActivityStart/ActivityEnd signals to the Gemini API to indicate turn boundaries.
  • Updated Gemini Live examples to default to server-side VAD (no local Silero VAD). Added a new dedicated example (26a-gemini-live-local-vad.py) demonstrating local VAD configuration.

Testing

  • Run the local VAD example: python examples/foundational/26a-gemini-live-local-vad.py
  • Confirm the bot responds after the user stops speaking and that interruptions work.

When Gemini Live was configured with local VAD (server-side VAD disabled),
the service was listening for the wrong frame types and not sending
ActivityStart/ActivityEnd events to the server. Now it listens for
VADUserStartedSpeakingFrame/VADUserStoppedSpeakingFrame and sends the
appropriate activity signals when local VAD is in use.

Also removes the unnecessary local SileroVADAnalyzer from server-side VAD
examples and adds a new 26a example demonstrating local VAD configuration.
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 7.69231% with 12 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/pipecat/services/google/gemini_live/llm.py 7.69% 12 Missing ⚠️
Files with missing lines Coverage Δ
src/pipecat/services/google/gemini_live/llm.py 26.27% <7.69%> (-0.30%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


Parameters:
disabled: Whether to disable VAD. Defaults to None.
disabled: Whether to disable VAD. Defaults to None (server-side VAD is enabled).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was server-side VAD enabled always the default and you're just calling it out explicitly here? Or is this a change?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up question: if so, prior to your changes to the examples in this PR, the examples attempted to have both local and server-side VAD going?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And follow-up to that: haven't we long maintained that local VAD is faster/more reliable than server-side, and that we recommend treating server-side signals as "supplementary"? (I remember that's what we recommended for AWS Nova Sonic at least)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it has always been, but it's definitely now the default. In looking at the Gemini docs, it was unclear what the behavior is, but in testing it, it's very clear that the default is to use the server-side VAD.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the examples attempted to have both local and server-side VAD going?

Correct. The local VAD was running but doing nothing, AFAICT.

Copy link
Copy Markdown
Contributor Author

@markbackman markbackman Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't we long maintained that local VAD is faster/more reliable than server-side, and that we recommend treating server-side signals as "supplementary"?

Yes, that is the case for most things. Though, for Gemini Live, the server-side VAD yields useful user transcripts whereas using the local VAD yields garbage for the user transcripts. I'm not sure why this is; perhaps their STT model requires a specific amount of silence padding. We'll have to ask the Google team.

Copy link
Copy Markdown
Contributor

@filipi87 filipi87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@markbackman markbackman merged commit 21a729a into main Mar 26, 2026
6 checks passed
@markbackman markbackman deleted the mb/gemini-live-local-vad branch March 26, 2026 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants