feat: add TTS voice response messages via Mistral Voxtral API by vkavun · Pull Request #167 · RichardAtCT/claude-code-telegram

vkavun · 2026-03-28T12:16:08Z

Summary

Add text-to-speech capability so the bot can send Claude's responses as Telegram voice messages using Mistral's Voxtral TTS API
Per-user /voice on|off toggle persisted in SQLite, gated behind admin-level ENABLE_VOICE_RESPONSES env var
Short responses: sent as voice message + brief label; long responses (>threshold): Claude summarizes for spoken delivery, audio of summary + full text sent
Graceful fallback to text with "(Audio unavailable, sent as text)" note on TTS failure

Changes

Config: 5 new env vars (ENABLE_VOICE_RESPONSES, VOICE_RESPONSE_MODEL, VOICE_RESPONSE_VOICE, VOICE_RESPONSE_FORMAT, VOICE_RESPONSE_MAX_LENGTH)
Feature flag: voice_responses_enabled in FeatureFlags
Storage: Migration 5 adds voice_responses_enabled column to users table + repository get/set methods
VoiceHandler: New synthesize_speech() method calling client.audio.speech.complete_async()
Orchestrator: /voice command handler + _maybe_send_voice_response() wired into agentic_text() flow
CLAUDE.md: Updated with new command and settings docs

Test plan

533 tests pass, 0 failures
Enable ENABLE_VOICE_RESPONSES=true in production env
Verify /voice on persists preference and /voice off clears it
Send a short message and confirm voice message is received
Send a long message (>2000 chars) and confirm summary audio + full text
Verify TTS failure gracefully falls back to text with note

🤖 Generated with Claude Code

TTS capability using Mistral Voxtral API to send Claude responses as Telegram voice messages, with user toggle and graceful fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

10-task TDD plan covering config, feature flag, storage migration, TTS synthesis, /voice command, and orchestrator wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add five new Pydantic Settings fields for text-to-speech voice responses: enable_voice_responses, voice_response_model, voice_response_voice, voice_response_format, and voice_response_max_length. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add voice_responses_enabled property to FeatureFlags that gates TTS on both enable_voice_responses setting and mistral_api_key being set. Register it in is_feature_enabled() and get_enabled_features(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds migration 5 to extend the users table with a voice_responses_enabled boolean column, updates UserModel with the new field, and adds get/set repository methods to UserRepository with full test coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements TDD-verified method that checks feature flag, user toggle, synthesizes speech via voice_handler, and falls back to text on failure. Short responses get a label; long responses get summarized via Claude + full text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Insert _maybe_send_voice_response() call between image-caption logic and text-sending loop; skip text messages when voice is successfully sent. Initialize response_content=None before try block to prevent UnboundLocalError on error paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add test for long-response summarization path (Task 8) and update TTS failure test to assert fallback note; send "(Audio unavailable, sent as text)" message in the except block when TTS fails (Task 9). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Also remove unused imports in test_voice_command.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The TTS voice response was only wired into agentic_text() but voice messages go through agentic_voice() -> _handle_agentic_media_message() which had its own separate response-sending path without TTS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- synthesize_speech() now uses httpx directly to call /v1/audio/speech - Uses voice_id (UUID) instead of voice name (no preset names exist) - Decodes base64 audio_data from response - Correct model: voxtral-mini-tts-2603 (not voxtral-4b-tts-2603) - Default voice: Paul Neutral (c69964a6-ab8b-4f8a-9465-ec0925096ec8) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Audio-only for short responses; long responses still send full text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When ENABLE_VOICE_RESPONSES is true, the bot appends instructions to Claude's system prompt so it knows about TTS capabilities and stops telling users it cannot send voice messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

RichardAtCT · 2026-03-30T06:58:34Z

Thanks for this comprehensive TTS implementation! A few things needed before we can merge:

Size concern: At 2K+ lines, this is a large PR. Consider whether any parts can be split out (e.g., the database migration as a separate PR).
Test coverage: Please add automated tests for the core TTS logic (Mistral API client, summarization for long responses, fallback behavior).
Coordination with Add make run-watch for auto-restart during development #158: PR Add make run-watch for auto-restart during development #158 (which we're merging) also modifies voice-related code (whisper.cpp support). You'll likely need to rebase after Add make run-watch for auto-restart during development #158 lands.
orchestrator.py conflicts: Several other PRs touching orchestrator.py are being merged — please rebase once the current batch completes.

The feature design is solid — looking forward to getting this in after the above items are addressed.

Handles the check_match callback from escalation messages by running claude -p with web search to evaluate current match state, then editing the original message in-place with a verdict (winning/losing/won/lost). Button remains available for repeated checks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Handles the new [🔎 Investigate] button on trade fill notifications. Sends a placeholder reply, runs claude -p with DB queries/log analysis/ web search, then edits the placeholder with structured investigation results. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The check_match_button markup only included the Check Match button, so clicking it would replace both buttons with just the one. Now both buttons are preserved after the message edit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instead of spawning Claude with WebSearch/WebFetch tools to find match scores (slow, expensive, leaks source URLs), now: 1. Look up sofa_id from poly_dashboard DB by player names 2. Fetch live/final score directly from SofaScore API (~1s) 3. Pass structured score data to Claude with no tools for assessment Result: faster response, cleaner 3-line output (status/score/reason), no web search sources in the verdict. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Rewrite prompt to demand exactly 3 lines with no reasoning/analysis - Parse stdout to extract only STATUS/Score/Reason lines, strip preamble - Add --max-turns 1 to prevent tool loop overhead Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use literal space [ ] instead of \s in capture groups to prevent matching across line boundaries (e.g. "Trade Filled\nPanna Udvardy" instead of just "Panna Udvardy"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SofaScore API returns 403 from the prod server IP. Instead of calling the API directly (which would need proxy routing), read the latest match snapshot from the poly_dashboard DB — the collector already stores live scores there via proxied SofaScore polling. Also adds current set games to the score summary for Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instead of showing "unexpected error" when users click buttons from before a bot restart, catch the Telegram "query too old" error and continue processing the action normally. Also prevent these benign errors from being logged as security violations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

americodias · 2026-04-27T14:29:59Z

👋 Came across this while extracting upstream-candidate features from a downstream fork. Wanted to share an alternative TTS shape in case it's useful as input — happy to defer to your approach (or open a follow-up PR layering provider fallback on top) depending on which direction @RichardAtCT prefers.

Different scope: rather than Mistral-only, a provider fallback chain with three providers (OpenAI / ElevenLabs / Piper-via-Wyoming) that retries automatically on per-provider failure with a 5-min cooldown, plus a /tts command for runtime status + provider switching. Designed for "self-hosted Piper on the homelab as primary, OpenAI as fallback when Piper is down" rather than choosing one vendor at deploy time.

Code lives at: https://github.com/americodias/claude-code-telegram/blob/upstream-switch/src/bot/features/tts_handler.py (400 LOC, lazy provider client init, sentence-boundary chunking for OpenAI's 4096 / ElevenLabs' 5000 limits, ffmpeg PCM→OGG-Opus conversion for Piper).

The two designs aren't strictly incompatible — your voice_responses_enabled per-user toggle + summarize-on-long are orthogonal to the provider-selection layer. If you'd prefer to land #167 first then layer fallback on top, I'm happy to rebase mine onto whatever lands here.

No pressure either way — your PR's been quietly waiting for 3+ weeks and I'd rather not stack a competing one without checking.

vkavun and others added 16 commits March 28, 2026 13:33

docs: add design spec for audio response messages feature

54522d7

TTS capability using Mistral Voxtral API to send Claude responses as Telegram voice messages, with user toggle and graceful fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add implementation plan for audio response messages

38679a5

10-task TDD plan covering config, feature flag, storage migration, TTS synthesis, /voice command, and orchestrator wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add synthesize_speech() TTS method to VoiceHandler

24f3a33

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add /voice on|off toggle command

d5cec76

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: update orchestrator tests for /voice command (7 commands)

f31da0d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: update CLAUDE.md with /voice command and TTS settings

c27851e

Also remove unused imports in test_voice_command.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: remove redundant 'Voice response' text label for short responses

cd91f62

Audio-only for short responses; long responses still send full text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ubuntu and others added 8 commits April 4, 2026 12:32

fix: player name regex matching across newlines

2e1cbbd

Use literal space [ ] instead of \s in capture groups to prevent matching across line boundaries (e.g. "Trade Filled\nPanna Udvardy" instead of just "Panna Udvardy"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add TTS voice response messages via Mistral Voxtral API#167

feat: add TTS voice response messages via Mistral Voxtral API#167
vkavun wants to merge 24 commits intoRichardAtCT:mainfrom
vkavun:feature/audio-response-messages

vkavun commented Mar 28, 2026

Uh oh!

RichardAtCT commented Mar 30, 2026

Uh oh!

americodias commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vkavun commented Mar 28, 2026

Summary

Changes

Test plan

Uh oh!

RichardAtCT commented Mar 30, 2026

Uh oh!

americodias commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants