use audio context hook for InterruptibleTTSService#4099
use audio context hook for InterruptibleTTSService#4099omChauhanDev wants to merge 2 commits intopipecat-ai:mainfrom
Conversation
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
Replace the _bot_speaking guard with on_audio_context_interrupted() override so the websocket is always reconnected when audio is in-transit, fixing the race condition where interruptions during the BotStartedSpeakingFrame round-trip window would leave stale audio streaming. Fish Audio TTS now calls super().on_audio_context_interrupted() to trigger the reconnect before stopping metrics. Fixes pipecat-ai#3986 (based on PR pipecat-ai#4099)
|
Thank you very much!! This PR's fix would still prevent the server from continuing to synthesize and stream unused audio (cost/bandwidth), but is that the intended scope? Does the PR description account for #4090? |
Replace the _bot_speaking guard with on_audio_context_interrupted() override so the websocket is always reconnected when audio is in-transit, fixing the race condition where interruptions during the BotStartedSpeakingFrame round-trip window would leave stale audio streaming. Fish Audio TTS now calls super().on_audio_context_interrupted() to trigger the reconnect before stopping metrics. Fixes pipecat-ai#3986 (based on PR pipecat-ai#4099)
|
Hey @yuki901, nice catch - you're right that #4090 handles the immediate window. After interruption, That said, this PR still covers two things #4090 doesn't:
Happy to update the PR description to reference #4090 & clarify the scope. Thanks for flagging it! |
Replace the _bot_speaking guard with on_audio_context_interrupted() override so the websocket is always reconnected when audio is in-transit, fixing the race condition where interruptions during the BotStartedSpeakingFrame round-trip window would leave stale audio streaming. Fish Audio TTS now calls super().on_audio_context_interrupted() to trigger the reconnect before stopping metrics. Fixes pipecat-ai#3986 (based on PR pipecat-ai#4099)
Replace the _bot_speaking guard with on_audio_context_interrupted() override so the websocket is always reconnected when audio is in-transit, fixing the race condition where interruptions during the BotStartedSpeakingFrame round-trip window would leave stale audio streaming. Fish Audio TTS now calls super().on_audio_context_interrupted() to trigger the reconnect before stopping metrics. Fixes pipecat-ai#3986 (based on PR pipecat-ai#4099)
|
Tagging @filipi87 to take a look. |
There was a problem hiding this comment.
Hi @omChauhanDev,
As discussed above, all TTS services now route audio through the audio context. When an interruption occurs, all audio contexts are canceled. As a result, even if audio arrives afterward, append_to_audio_context may be called with an invalid or undefined context_id, and the audio is effectively discarded. In this case, we log a debug message indicating that the audio was dropped.
The behavior of reconnecting only when the bot is speaking was introduced as an optimization, since reestablishing the connection can sometimes take a couple of seconds. Keeping this behavior helps maintain a better user experience.
For that reason, I believe it still makes sense to preserve the current approach, only reconnecting when the bot is actively speaking.
That said, I do see a small window where stale audio could leak through. However, this would likely only occur in the case where run_tts has been invoked but the BotStartedSpeakingFrame has not yet been received. This seems to be the only scenario where the issue could arise.
If that’s the case, we could address it more directly within the InterruptibleTTSService, by treating the presence of any audio context as an indication that the bot has started speaking. Something along these lines should be sufficient to prevent the race condition:
async def push_frame(self, frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM):
if isinstance(frame, TTSStartedFrame):
self._bot_speaking = True
await super().push_frame(frame, direction)
This has been fixed in this PR: |
Please describe the changes in your PR. If it is addressing an issue, please reference that as well.
Fixes #3986
Issue :
The
_bot_speakingguard inInterruptibleTTSService._handle_interruption()skips websocket disconnect/reconnect whenBotStartedSpeakingFramehasn't reached the TTS processor yet. If a user interrupts while audio is still being synthesized or in-transit, the TTS server keeps streaming stale audio into the next response.Note: #4090 partially addressed this by routing audio through append_to_audio_context(), so stale audio is discarded when no active context exists. However, the server still continues synthesizing unused audio (wasted cost/bandwidth), and old audio can leak into the next response once a new audio context becomes active.
Approach :
Replaced the
_bot_speakingguard with anon_audio_context_interrupted()override - the same hook ElevenLabs, Rime, & Deepgram already use. Audio contexts exist from synthesis start to playback end, so this fires exactly when needed & stays silent when the bot is idle (preserving the original optimization against unnecessary reconnects from VAD noise).Changes :