feat: implement voice mode improvements with continuous loop, audio error recovery, and TTS discovery by devin-ai-integration[bot] · Pull Request #459 · OpenSecretCloud/Maple

devin-ai-integration · 2026-03-06T00:19:44Z

Voice Mode Improvements

Summary

Implements the maple-voice-improvements spec with three main features:

1. Voice Mode — Continuous Loop State Machine

When a user starts recording on a TTS-capable platform (desktop/iOS), the app enters voice mode: a hands-free loop of Recording → Processing → Waiting → Generating → Playing → (500ms pause) → Recording. The mic button highlights when voice mode is active and acts as an exit button. Exit is also triggered on chat switch, new chat, or any overlay X button.

2. Audio Error Recovery

On transcription failure, the audio blob is retained in memory. The overlay shows an error state with the original recording duration, a Retry button (re-sends the same blob), and a Discard button. This works both inside and outside voice mode.

3. TTS Discovery Prompt & Enhanced Feedback

One-time prompt: After the first successful voice message on TTS-capable platforms, if TTS models aren't installed, show an inline prompt ("Enable voice responses?") with a Download button (~264 MB) and dismiss (X).
Generating animation: The speaker icon (TTSButton) pulses during TTS generation.
Audio cues: Play mic-on.wav (gentle ascending tone) when recording starts and mic-off.wav (confirmation tone) after sending successfully.

Updates since last revision

TestFlight Bug Fixes (rounds 10–13)

TTS auto-play after model download (Bug 1): Added a prevTtsStatusRef effect that watches for ttsStatus transitioning from non-ready → "ready" while voice mode is active. When the user downloads a TTS model mid-voice-loop, this retroactively speaks the last assistant message instead of silently skipping TTS. Guards: voiceModeRef.current, !isGenerating, !ttsIsPlaying, !ttsIsGenerating.
Compact overlay showing "Playing" state (Bug 2): The bottom input overlay (isCompact={true}) previously hid all status content and waveform — showing only a black overlay with an X button during TTS playback. Now shows status text (Playing, Generating, Waiting, Error) and animated waveform in compact mode for non-recording states.
iOS audio cues (Bug 3): Replaced new Audio('/audio/file.wav') with Web Audio API (AudioContext + fetch + decodeAudioData + BufferSource). HTMLAudioElement.play() is unreliable in iOS Tauri WebView due to autoplay restrictions; Web Audio API works more consistently when called from user gesture context.
Empty blob recovery race (Devin Review round 10): startRecordingRef.current() after empty blob recovery now uses setTimeout(0) to defer until React's setIsRecording(false) batch commits, preventing the isRecording guard from early-returning.
Prettier formatting fix for CI.

Earlier fixes (rounds 1–9)

Non-voice overlay dismissal (round 6): Added setVoiceState(null) after successful transcription in non-voice mode so the overlay dismisses instead of staying stuck.
Voice mode error guard (round 6): Voice continuation effect now checks errorRef.current before proceeding with TTS. If handleSendMessage failed, voice mode exits gracefully instead of speaking the previous turn's stale assistant message.
TTS failure loop break (round 7): speakInternal now re-throws errors after handling them locally, so speakAndWait properly rejects. This allows the voice mode continuation effect's .catch() to fire and call exitVoiceMode(), breaking what was previously an infinite loop on TTS failure.
speak() wrapper error handling (round 8): The speak() wrapper (used by TTSButton) now catches errors from speakInternal to prevent unhandled promise rejections. Only speakAndWait (used by voice mode loop) propagates errors.
exitVoiceMode stale TTS cleanup (round 5): Calls cancelTTSGeneration() and stopTTS() unconditionally (they're idempotent) instead of guarding on ttsIsGenerating/ttsIsPlaying which could be stale in the event handler effect closure.
Earlier fixes (rounds 1–4): recordingStartTimeRef for accurate duration capture, startRecordingRef for stale closure in handleVoiceDiscard, handleSendMessageRef/handleTTSDiscoveryRef for stale model/conversation closures, isTauri() guard on iOS TTS platform check, ?? instead of || for 0-second duration display, recording restart on empty blob, if (recorderRef.current) instead of if (recorderRef.current && isRecording) for mic leak fix.

Review & Testing Checklist for Human

⚠️ Risk Level: YELLOW — Complex async state machine with multiple ref-based staleness mitigations across 10 rounds of fixes; TestFlight bugs fixed without iOS device testing. The incremental fix pattern increases interaction risk.

TestFlight Bug 1 (TTS auto-play): On iOS, start voice mode → record a message → dismiss TTS discovery prompt (don't download) → record another message → now download TTS model via discovery prompt → verify the last assistant message auto-plays once TTS is ready (should hear response spoken aloud)
TestFlight Bug 2 (compact overlay "Playing" state): On iOS/mobile, in voice mode, verify the bottom overlay shows "Playing" text and animated blue waveform during TTS playback (not just a black screen with X button)
TestFlight Bug 3 (audio cues on iOS): On iOS, verify mic-on.wav plays when recording starts and mic-off.wav plays after successful send. Web Audio API should work where new Audio() failed.
Voice mode state machine: Manually test the full loop (record → send → wait → TTS plays → auto-record again) on desktop or iOS. Verify clean exit on:
- Chat switch / new chat
- Mic button tap during voice mode
- X button during any overlay state
- Speaker icon tap during generation
Voice mode exit on errors: Two distinct error paths now exit voice mode:
- Chat generation failure: Force a network error after sending a voice message → verify voice mode exits (not loops) and old assistant message is NOT spoken via TTS
- TTS playback failure: Force a TTS failure (e.g., corrupt model, Lockdown Mode blocking AudioContext) → verify voice mode exits (not infinite-loops between recording and failed TTS)
Error recovery: Trigger a transcription failure (silent audio, network drop). Verify error overlay shows correct duration (including 0-second recordings), Retry re-sends same blob, Discard clears state. Test both in voice mode and single recording mode.

Notes

Code changes:
- TTSContext.tsx: Added isGenerating, speakAndWait (async playback wait), cancelGeneration, generation sequence IDs (generationSeqRef) for staleness detection, error re-throwing in speakInternal, error catching in speak() wrapper
- RecordingOverlay.tsx: Extended to 6 states (recording/processing/error/waiting/generating/playing) with retry/discard buttons, compact mode now shows status and waveform for playback states
- UnifiedChat.tsx: Voice mode state machine, error recovery with blob retention, audio cue playback (Web Audio API), TTS discovery prompt, voice mode continuation effect with errorRef guard, prevTtsStatusRef effect for TTS-ready auto-play, ref-based closures for startRecording, handleSendMessage, handleTTSDiscovery to avoid staleness
Audio files: Added frontend/public/audio/mic-on.wav and mic-off.wav (binary files)
Ref pattern: Used startRecordingRef, handleSendMessageRef, handleTTSDiscoveryRef, recordingStartTimeRef, errorRef, prevTtsStatusRef to ensure callbacks always use latest state/functions
Duration capture: Uses recordingStartTimeRef (set to Date.now() after recorder.startRecording()) instead of RecordRTC internal field
Error propagation: speakInternal re-throws errors after handling them. speak() catches (for TTSButton), speakAndWait propagates (for voice mode loop).
exitVoiceMode cleanup: Now calls cancelTTSGeneration() and stopTTS() unconditionally (idempotent) and checks if (recorderRef.current) instead of if (recorderRef.current && isRecording) to avoid stale closure issues
Web Audio API for audio cues: Uses AudioContext + fetch + decodeAudioData + BufferSource instead of new Audio() for better iOS WebView compatibility. Each cue creates and closes its own AudioContext.
Empty blob recovery: setTimeout(0) defers startRecordingRef.current() to allow React batch to commit setIsRecording(false) before the isRecording guard runs

Link to Devin Session: https://app.devin.ai/sessions/0c853f0e1ba84474971875a61f616769
Requested by: @marksftw

…rror recovery, and TTS discovery - Add voice mode state machine: Recording → Processing → Waiting → Generating → Playing → Recording loop - Add audio error recovery with blob retention, retry, and discard - Add audio cues (mic-on.wav, mic-off.wav) for recording state transitions - Add TTS discovery prompt for first-time users on supported platforms - Extend RecordingOverlay with 6 voice mode states (recording, processing, error, waiting, generating, playing) - Update TTSContext with isGenerating, speakAndWait, cancelGeneration, and sequence ID tracking - Update TTSButton with generating animation state - Add voice mode exit on chat switch, new chat, and manual stop - Highlight mic button when voice mode is active Closes #458 Co-Authored-By: marks <markskram@protonmail.com>

devin-ai-integration · 2026-03-06T00:19:47Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

Co-Authored-By: marks <markskram@protonmail.com>

cloudflare-workers-and-pages · 2026-03-06T00:22:02Z

Deploying maple with Cloudflare Pages

Latest commit:	`7f74698`
Status:	✅ Deploy successful!
Preview URL:	https://a56627c6.maple-ca8.pages.dev
Branch Preview URL:	https://devin-1772755465-voice-mode.maple-ca8.pages.dev

View logs

…VoiceDiscard - Add recordingStartTimeRef to track when recording starts (RecordRTC has no startTime property) - Add startRecordingRef to avoid stale closure in handleVoiceDiscard's empty dependency array - capturedDuration now correctly reflects actual recording time instead of always being 0 Co-Authored-By: marks <markskram@protonmail.com>

… for savedDuration - Replace all 3 startRecording() calls in voice continuation effect with startRecordingRef.current() - Change recordingDuration || undefined to recordingDuration ?? undefined (both inputs) to correctly show 0-second durations in error UI Co-Authored-By: marks <markskram@protonmail.com>

…g on empty blob - isTTSPlatform now checks isTauri() && isIOS() to match TTSContext behavior - Empty blob in voice mode now calls startRecordingRef.current() to restart Co-Authored-By: marks <markskram@protonmail.com>

…ssage/handleTTSDiscovery - exitVoiceMode: check recorderRef.current instead of isRecording state to avoid stale closure in event handler effect - Add handleSendMessageRef and handleTTSDiscoveryRef to prevent stale closures in transcribeAndSend - transcribeAndSend now calls through refs for latest handleSendMessage and handleTTSDiscovery Co-Authored-By: marks <markskram@protonmail.com>

Co-Authored-By: marks <markskram@protonmail.com>

…e to avoid stale closure These functions are idempotent, so guarding on ttsIsGenerating/ttsIsPlaying was unnecessary and caused stale closures when exitVoiceMode was captured in the event listener effect. Co-Authored-By: marks <markskram@protonmail.com>

marksftw · 2026-03-06T01:11:51Z

@TestFlight build

github-actions · 2026-03-06T01:12:01Z

🚀 TestFlight deployment triggered! Check the Actions tab for progress.

…on generation error - Set voiceState(null) after successful transcription in non-voice mode to dismiss overlay - Add errorRef to track error state without stale closures - Voice continuation effect checks errorRef.current and exits voice mode on error instead of speaking stale assistant message Co-Authored-By: marks <markskram@protonmail.com>

github-actions · 2026-03-06T01:17:26Z

✅ TestFlight deployment completed successfully!

…de exits Without re-throwing, speakAndWait always resolves normally even on TTS failure, causing the voice mode continuation effect's .catch() to never fire and creating an infinite loop: TTS fails → recording restarts → repeat. Co-Authored-By: marks <markskram@protonmail.com>

devin-ai-integration

Devin Review found 2 new potential issues.

View 24 additional findings in Devin Review.

frontend/src/services/tts/TTSContext.tsx

devin-ai-integration · 2026-03-06T01:27:35Z

frontend/src/components/UnifiedChat.tsx

+        const newRetryCount = voiceRetryCount + 1;
+        setVoiceRetryCount(newRetryCount);


🟡 Stale voiceRetryCount in transcribeAndSend causes incorrect retry count on rapid retries

transcribeAndSend is a useCallback with voiceRetryCount in its dependency array (frontend/src/components/UnifiedChat.tsx:2073). In the catch block at line 2052, it reads voiceRetryCount from the closure and calculates newRetryCount = voiceRetryCount + 1. However, handleVoiceRetry (line 2138-2145) calls transcribeAndSend which may be holding a stale voiceRetryCount value. If the user taps Retry quickly twice, the second call uses the same closure as the first (since the setVoiceRetryCount from the first call hasn't triggered a re-render and new callback yet), so voiceRetryCount stays at the old value and the count never reaches 3 to show the extended error message. A ref should be used for voiceRetryCount to avoid the stale closure.

Was this helpful? React with 👍 or 👎 to provide feedback.

…n TTSButton speak() is used by TTSButton which doesn't have try/catch. Only speakAndWait (used by voice mode loop) needs error propagation to exit on TTS failure. Co-Authored-By: marks <markskram@protonmail.com>

marksftw · 2026-03-06T01:32:09Z

@TestFlight build

github-actions · 2026-03-06T01:32:18Z

🚀 TestFlight deployment triggered! Check the Actions tab for progress.

github-actions · 2026-03-06T01:39:33Z

✅ TestFlight deployment completed successfully!

…r overlay via X button Both centered-input and bottom-input overlay cancel handlers now reset voiceRetryCount to 0 and recordingDuration to 0, matching handleVoiceDiscard behavior. Prevents stale retry count from carrying over to future recordings. Co-Authored-By: marks <markskram@protonmail.com>

…io cues, TTS auto-play after model download Bug 1: After downloading TTS model mid-voice-loop, the last assistant message now auto-plays via a new effect that watches ttsStatus becoming 'ready'. Bug 2: Show Playing/Generating/Waiting status text and waveform in compact (bottom input) overlay mode, not just non-compact. Previously isCompact=true hid all status content during playback. Bug 3: Use Web Audio API (AudioContext + fetch + decodeAudioData) instead of new Audio() for mic-on/mic-off cues - more reliable on iOS WebView. Also fixes Devin Review round 10: defer startRecordingRef.current() call with setTimeout(0) after empty blob recovery to let React batch commit setIsRecording(false) before the isRecording guard runs. Co-Authored-By: marks <markskram@protonmail.com>

Co-Authored-By: marks <markskram@protonmail.com>

…-on audio cue Devin Review round 11: - RecordingOverlay: add setDuration(0) when effectiveState transitions to 'recording' to prevent a one-frame flash of the previous recording's duration before rAF resets it. - UnifiedChat: remove playAudioCue('mic-on') from voice continuation callers (4 sites) since startRecording() at line 1970 already plays the cue. This prevented a double-play stutter/echo in the voice mode loop. Co-Authored-By: marks <markskram@protonmail.com>

Devin Review round 12: - TTSContext: store audioContextRef immediately after AudioContext creation so stopPlayback() can close it if decodeAudioData or other operations between creation and the old ref assignment throw. - TTSContext: throw Error('no_speakable_text') instead of silently returning when preprocessTextForTTS strips all content (e.g. code-only responses). - UnifiedChat: catch 'no_speakable_text' in both voice continuation effects and restart recording instead of exiting voice mode, so the user gets audio feedback (mic-on cue) rather than a silent mic activation. Co-Authored-By: marks <markskram@protonmail.com>

devin-ai-integration

Devin Review found 2 new potential issues.

View 31 additional findings in Devin Review.

devin-ai-integration · 2026-03-06T02:41:32Z

frontend/src/components/UnifiedChat.tsx

+                if (!voiceModeRef.current) return;
+                setVoiceState("recording");
+                startRecordingRef.current();


🟡 Voice mode loop silently breaks when startRecording fails, leaving UI stuck on "Recording"

Throughout the voice mode loop, setVoiceState("recording") is called optimistically before startRecordingRef.current(), but startRecording is an async fire-and-forget that catches its own errors internally (e.g., microphone permission denied, device busy). If startRecording fails, voiceState remains "recording" while no actual recording is happening. The user sees a "Recording" overlay with no error feedback (the audioError banner auto-dismisses after 5 seconds at UnifiedChat.tsx:1997), and the voice mode loop is silently broken.

This affects multiple paths in the voice continuation loop: post-TTS restart (UnifiedChat.tsx:2938-2940), no-speakable-text restart (UnifiedChat.tsx:2947-2948), no-text restart (UnifiedChat.tsx:2955-2956), discard-and-re-record (UnifiedChat.tsx:2173-2174), and empty-recording retry (UnifiedChat.tsx:2132-2135). The user can still tap X to exit, but the automatic loop is broken without clear UI feedback.

Prompt for agents

In frontend/src/components/UnifiedChat.tsx, the startRecording function (line 1922) is called fire-and-forget from many voice-mode loop restart points (lines 2135, 2174, 2938-2940, 2948, 2956, 3018). When startRecording fails (microphone permission denied, device busy, etc.), it catches the error internally and sets audioError, but does not communicate the failure back to the voice mode loop. This leaves voiceState as "recording" without actual recording. To fix this, either: 1. Make startRecording return a boolean indicating success, and have all voice-mode restart points check the return value and call exitVoiceMode() on failure. This requires awaiting the call. 2. Or, inside startRecording's catch block (around line 1978), check if voice mode is active (voiceModeRef.current) and call exitVoiceMode() to clean up the voice loop state. Option 2 is simpler: add to the catch block in startRecording (after the existing error handling, around line 1997): if (voiceModeRef.current) { exitVoiceMode(); } This ensures the voice mode loop exits cleanly when the microphone becomes unavailable.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-03-06T02:41:33Z

frontend/src/services/tts/TTSContext.tsx

+        if (generationSeqRef.current !== mySeq) {
+          void audioContext.close().catch(() => {});
+          audioContextRef.current = null;
+          URL.revokeObjectURL(audioUrl);
+          return;


🟡 Stale speakInternal cleanup clobbers audioContextRef.current set by a concurrent call

In TTSContext.tsx, when two speakInternal calls overlap at async boundaries, the stale call's cleanup can null out the newer call's audioContextRef. Specifically: Call A sets audioContextRef.current at line 363, then yields at await audioContext.resume() (line 367). Call B starts, calls stop() (cleaning up Call A's context), then sets audioContextRef.current to its own context at line 363 and yields. Call A resumes, hits the staleness check at line 371, and executes audioContextRef.current = null at line 373 — clobbering Call B's reference. After this, stopPlayback() cannot close Call B's AudioContext directly; it relies on sourceNodeRef.current.stop() triggering onended as a fallback.

Fix suggestion

At line 371-376, only null out audioContextRef if it still points to this call's context:

if (generationSeqRef.current !== mySeq) { void audioContext.close().catch(() => {}); if (audioContextRef.current === audioContext) { audioContextRef.current = null; } URL.revokeObjectURL(audioUrl); return; }

Suggested change

if (generationSeqRef.current !== mySeq) {

void audioContext.close().catch(() => {});

audioContextRef.current = null;

URL.revokeObjectURL(audioUrl);

return;

if (generationSeqRef.current !== mySeq) {

void audioContext.close().catch(() => {});

if (audioContextRef.current === audioContext) {

audioContextRef.current = null;

}

URL.revokeObjectURL(audioUrl);

return;

}

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration bot assigned marksftw Mar 6, 2026

devin-ai-integration bot requested a review from marksftw March 6, 2026 00:19

style: fix Prettier formatting in RecordingOverlay and UnifiedChat

cbf891e

Co-Authored-By: marks <markskram@protonmail.com>

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot and others added 2 commits March 6, 2026 00:54

style: fix Prettier formatting for ref type declarations

08db7a3

Co-Authored-By: marks <markskram@protonmail.com>

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot commented Mar 6, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot and others added 2 commits March 6, 2026 01:50

style: run Prettier on RecordingOverlay and UnifiedChat

8d2c190

Co-Authored-By: marks <markskram@protonmail.com>

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot commented Mar 6, 2026

View reviewed changes

		const newRetryCount = voiceRetryCount + 1;
		setVoiceRetryCount(newRetryCount);

Conversation

devin-ai-integration bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Voice Mode Improvements

Summary

1. Voice Mode — Continuous Loop State Machine

2. Audio Error Recovery

3. TTS Discovery Prompt & Enhanced Feedback

Updates since last revision

TestFlight Bug Fixes (rounds 10–13)

Earlier fixes (rounds 1–9)

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration bot commented Mar 6, 2026

🤖 Devin AI Engineer

Uh oh!

cloudflare-workers-and-pages bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying maple with Cloudflare Pages

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

marksftw commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

marksftw commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devin-ai-integration bot commented Mar 6, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Mar 6, 2026 •

edited

Loading