Skip to content

use audio context hook for InterruptibleTTSService#4099

Closed
omChauhanDev wants to merge 2 commits intopipecat-ai:mainfrom
omChauhanDev:fix/interruptible-tts-bot-speaking-race
Closed

use audio context hook for InterruptibleTTSService#4099
omChauhanDev wants to merge 2 commits intopipecat-ai:mainfrom
omChauhanDev:fix/interruptible-tts-bot-speaking-race

Conversation

@omChauhanDev
Copy link
Copy Markdown
Contributor

@omChauhanDev omChauhanDev commented Mar 21, 2026

Please describe the changes in your PR. If it is addressing an issue, please reference that as well.

Fixes #3986

Issue :

The _bot_speaking guard in InterruptibleTTSService._handle_interruption() skips websocket disconnect/reconnect when BotStartedSpeakingFrame hasn't reached the TTS processor yet. If a user interrupts while audio is still being synthesized or in-transit, the TTS server keeps streaming stale audio into the next response.

Note: #4090 partially addressed this by routing audio through append_to_audio_context(), so stale audio is discarded when no active context exists. However, the server still continues synthesizing unused audio (wasted cost/bandwidth), and old audio can leak into the next response once a new audio context becomes active.

Approach :

Replaced the _bot_speaking guard with an on_audio_context_interrupted() override - the same hook ElevenLabs, Rime, & Deepgram already use. Audio contexts exist from synthesis start to playback end, so this fires exactly when needed & stays silent when the bot is idle (preserving the original optimization against unnecessary reconnects from VAD noise).

Changes :

  • tts_service.py: removed _bot_speaking, _handle_interruption, process_frame override; added on_audio_context_interrupted with disconnect/reconnect
  • fish/tts.py: added super() call in existing on_audio_context_interrupted override

@omChauhanDev omChauhanDev changed the title fix: use audio context hook for InterruptibleTTSService use audio context hook for InterruptibleTTSService Mar 21, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 21, 2026

Codecov Report

❌ Patch coverage is 25.00000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/pipecat/services/tts_service.py 33.33% 2 Missing ⚠️
src/pipecat/services/fish/tts.py 0.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
src/pipecat/services/fish/tts.py 7.69% <0.00%> (-0.04%) ⬇️
src/pipecat/services/tts_service.py 66.43% <33.33%> (+0.56%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yuki901 added a commit to kotobasamurai-ai/pipecat that referenced this pull request Mar 22, 2026
Replace the _bot_speaking guard with on_audio_context_interrupted() override
so the websocket is always reconnected when audio is in-transit, fixing the
race condition where interruptions during the BotStartedSpeakingFrame
round-trip window would leave stale audio streaming.

Fish Audio TTS now calls super().on_audio_context_interrupted() to trigger
the reconnect before stopping metrics.

Fixes pipecat-ai#3986 (based on PR pipecat-ai#4099)
@yuki901
Copy link
Copy Markdown
Contributor

yuki901 commented Mar 22, 2026

Thank you very much!!
Btw, since #4090, all InterruptibleTTSService subclasses route audio through append_to_audio_context(). After an interruption, _handle_interruption replaces _audio_contexts with a fresh empty dict, so stale audio from _receive_messages() is silently discarded — the user-facing bug in #3986 seems already fixed regardless of the _bot_speaking guard.

This PR's fix would still prevent the server from continuing to synthesize and stream unused audio (cost/bandwidth), but is that the intended scope? Does the PR description account for #4090?

yuki901 added a commit to kotobasamurai-ai/pipecat that referenced this pull request Mar 22, 2026
Replace the _bot_speaking guard with on_audio_context_interrupted() override
so the websocket is always reconnected when audio is in-transit, fixing the
race condition where interruptions during the BotStartedSpeakingFrame
round-trip window would leave stale audio streaming.

Fish Audio TTS now calls super().on_audio_context_interrupted() to trigger
the reconnect before stopping metrics.

Fixes pipecat-ai#3986 (based on PR pipecat-ai#4099)
@omChauhanDev
Copy link
Copy Markdown
Contributor Author

omChauhanDev commented Mar 22, 2026

Hey @yuki901, nice catch - you're right that #4090 handles the immediate window. After interruption, _create_audio_context_task() replaces _audio_contexts with a fresh dict and _playing_context_id is reset to None, so stale audio from _receive_messages() hits append_to_audio_context(None, ...) and gets silently dropped. The audio-plays-right-after-interruption symptom is largely gone.

That said, this PR still covers two things #4090 doesn't:

  1. Server-side waste - without disconnecting, the TTS server keeps synthesizing and streaming audio nobody will use. That's wasted compute, bandwidth, and API cost.

  2. Audio crossover into the next response - there's a subtler window where old audio can leak. Once the next LLM response starts and a new audio context is created, _playing_context_id gets set to the new context ID. If old audio from the still-connected server arrives at that point, get_active_audio_context_id() returns the new ID, and append_to_audio_context routes old audio into the new context. The disconnect/reconnect prevents this by clearing server-side state entirely.

Happy to update the PR description to reference #4090 & clarify the scope. Thanks for flagging it!

yuki901 added a commit to kotobasamurai-ai/pipecat that referenced this pull request Mar 22, 2026
Replace the _bot_speaking guard with on_audio_context_interrupted() override
so the websocket is always reconnected when audio is in-transit, fixing the
race condition where interruptions during the BotStartedSpeakingFrame
round-trip window would leave stale audio streaming.

Fish Audio TTS now calls super().on_audio_context_interrupted() to trigger
the reconnect before stopping metrics.

Fixes pipecat-ai#3986 (based on PR pipecat-ai#4099)
yuki901 added a commit to kotobasamurai-ai/pipecat that referenced this pull request Mar 23, 2026
Replace the _bot_speaking guard with on_audio_context_interrupted() override
so the websocket is always reconnected when audio is in-transit, fixing the
race condition where interruptions during the BotStartedSpeakingFrame
round-trip window would leave stale audio streaming.

Fish Audio TTS now calls super().on_audio_context_interrupted() to trigger
the reconnect before stopping metrics.

Fixes pipecat-ai#3986 (based on PR pipecat-ai#4099)
@markbackman markbackman requested a review from filipi87 March 26, 2026 01:57
@markbackman
Copy link
Copy Markdown
Contributor

Tagging @filipi87 to take a look.

Copy link
Copy Markdown
Contributor

@filipi87 filipi87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @omChauhanDev,

As discussed above, all TTS services now route audio through the audio context. When an interruption occurs, all audio contexts are canceled. As a result, even if audio arrives afterward, append_to_audio_context may be called with an invalid or undefined context_id, and the audio is effectively discarded. In this case, we log a debug message indicating that the audio was dropped.

The behavior of reconnecting only when the bot is speaking was introduced as an optimization, since reestablishing the connection can sometimes take a couple of seconds. Keeping this behavior helps maintain a better user experience.

For that reason, I believe it still makes sense to preserve the current approach, only reconnecting when the bot is actively speaking.

That said, I do see a small window where stale audio could leak through. However, this would likely only occur in the case where run_tts has been invoked but the BotStartedSpeakingFrame has not yet been received. This seems to be the only scenario where the issue could arise.

If that’s the case, we could address it more directly within the InterruptibleTTSService, by treating the presence of any audio context as an indication that the bot has started speaking. Something along these lines should be sufficient to prevent the race condition:

async def push_frame(self, frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM):
    if isinstance(frame, TTSStartedFrame):
        self._bot_speaking = True
    await super().push_frame(frame, direction)

@filipi87
Copy link
Copy Markdown
Contributor

If that’s the case, we could address it more directly within the InterruptibleTTSService, by treating the presence of any audio context as an indication that the bot has started speaking. Something along these lines should be sufficient to prevent the race condition:

This has been fixed in this PR:

@filipi87 filipi87 closed this Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

InterruptibleTTSService: _bot_speaking guard causes interruption to fail when TTS audio hasn't reached output transport yet

4 participants