-
Notifications
You must be signed in to change notification settings - Fork 0
User Voice
BLXCode supports voice input and voice replies in the agent panel.
- STT means speech-to-text: BLXCode records your microphone, transcribes the audio, and inserts the transcript into the agent composer.
- TTS means text-to-speech: when a turn started from voice input finishes, BLXCode can synthesize the assistant's final answer and play it back.
Voice features are available in the Tauri desktop app. They are not available in frontend-only trunk serve mode because microphone capture, provider keys, and native cache paths are handled by the Tauri backend.
You need:
- A working system microphone.
- Microphone permission granted to BLXCode or the development shell.
- An API key for the configured voice provider.
- Network access to the configured STT/TTS provider.
Voice API keys are set under Settings → API Keys (OpenAI, OpenRouter, AWS/Polly). The voice column in Settings → BLXCode Agent shows configured/missing status only.
- OpenAI — six OpenAI TTS voices selectable.
- OpenRouter — STT/TTS models; voice picks show OpenAI names disabled with a hint.
- AWS — six Polly voices when the AWS key is set.
The default voice settings are conservative:
| Setting | Default |
|---|---|
| STT provider | OpenAI |
| STT model | gpt-4o-mini-transcribe |
| Recording sample rate |
16000 Hz |
| TTS provider | OpenAI |
| TTS model | gpt-4o-mini-tts |
| TTS voice | nova |
| TTS autoplay | enabled |
| Post-STT behavior | auto-send |
| STT language | follow app locale |
| Push-to-talk hotkey | Space |
Open Settings (center tab) → BLXCode Agent → Voice section.
Settings → App holds STT language mode and push-to-talk hotkey only.
You can configure:
- STT/TTS provider and models (shared provider dropdown).
- Recording quality: low
16000, standard24000, or high48000Hz. - TTS voice (fixed catalog per provider) and gender filter.
- TTS autoplay on or off.
- Whether STT should auto-send or only fill a draft.
See Settings.
BLXCode can send an optional language hint with transcription requests:
-
Follow app: uses the current UI locale and reduces it to a primary ISO-639-1 language code, such as
defromde-DE. - Auto detect: sends no language hint and lets the provider detect speech language.
- Manual: sends the custom language code you enter.
Use the voice orb in the agent panel:
- Hold the orb longer than a short threshold to record push-to-talk style; release to transcribe.
- Click quickly to toggle recording; click again to stop and transcribe.
- Press Space or Enter while the orb is focused to start/stop recording.
- Press Escape while recording to cancel.
The global push-to-talk hotkey also starts recording when enabled. A plain key such as Space is ignored while typing in editable fields, so normal text input remains safe.
When post-STT behavior is auto-send, BLXCode submits the transcript to the agent immediately.
When post-STT behavior is draft, BLXCode inserts the transcript into the compose field so you can edit it before sending.
When a prompt came from voice input and TTS is enabled, BLXCode synthesizes the final assistant answer after the model turn completes. The generated MP3 is sent back to the frontend as a voice_ready event and played in the agent panel.
Text answers still appear normally. If TTS fails, the text answer remains available and BLXCode reports the TTS error separately.
- OpenAI:
https://api.openai.com/v1/audio/transcriptions - OpenRouter:
https://openrouter.ai/api/v1/audio/transcriptions
BLXCode sends WAV audio as multipart form data with response_format=text.
TTS currently uses OpenAI's speech endpoint:
- OpenAI:
https://api.openai.com/v1/audio/speech
OpenRouter TTS is not currently supported by the backend, even though OpenRouter can be used for STT.
The OpenAI voice catalog currently exposed in BLXCode is:
| Voice | Gender Hint |
|---|---|
alloy |
neutral |
ash |
male |
ballad |
female |
coral |
female |
echo |
male |
fable |
neutral |
nova |
female |
onyx |
male |
sage |
female |
shimmer |
female |
The gender label is only a UI filtering hint.
During recording, BLXCode writes a temporary WAV file under the app cache directory:
<app-cache>/voice/<turn-id>.wav
After transcription finishes, BLXCode deletes the WAV file. Cancelled recordings are also removed. The audio is still sent to the selected remote STT provider for transcription, so use a provider and model whose data policy fits your workflow.
- User-Agent-Harness
- User-Agent-Providers
- User-Appearance-Themes
- User-Building
- User-File-Preview
- User-Getting-Started
- User-Image
- User-Keyboard-Shortcuts
- User-Language
- User-Memory-And-Tasks
- User-Plans
- User-Rules-And-Skills
- User-Settings
- User-Subagents
- User-Troubleshooting
- User-Voice
- User-Workspaces
- Developer-Agent-Harness
- Developer-Architecture
- Developer-Contributing
- Developer-I18n
- Developer-Setup
- Developer-Subagents
- Developer-Tauri-Ipc
- Developer-Themes
- Developer-Voice