TextToSpeechPython is a PyQt6 desktop application for building Azure Speech
text-to-speech workflows around plain text, SSML previewing, and multi-format
document imports.
The current application supports:
- editing or pasting source text in the main window
- live SSML preview generation
- Azure Speech synthesis to preview or exported
.mp3files - a settings dialog for Azure credentials, voice, output directory, logging, playback volume, and advanced SSML controls
- document import for
.txt,.docx,.pdf,.html,.htm,.rtf,.epub,.xlsx,.xls,.csv, and.pptx - selective import of extracted document sections into the main editor
- batch export of selected imported rows to one audio file per item
- recent audio history in the main window
- Python 3.11+
- Poetry
- Azure Speech resource with a valid key and region
From the project root:
poetry installYou can launch the application in either of these ways:
poetry run python -m app.mainor:
poetry run tts-appThe app can start without Azure configured, but generation and export will stay unavailable until credentials are provided.
You can configure Azure Speech in either of these ways:
- In the GUI via
Tools > Settings - With a local
.envfile in INI format
Example .env:
[API]
key = YOUR_AZURE_SPEECH_KEY
region = YOUR_AZURE_REGIONGUI settings take precedence over .env when both are present.
- Paste text into the editor, or import content from
Import Document. - Open
Tools > Settingsto configure Azure, voice, output directory, and SSML options. - Review the generated SSML preview.
- Use
Generate & Playfor a temporary preview file orGenerate Fileto export an.mp3. - Review recent generated files in the
Recent Audiopanel.
The settings dialog includes an expandable advanced SSML section. The current GUI exposes:
- emphasis
- pitch
- pitch range
- pause duration
- pause position
These controls are applied to both the SSML preview and generated audio.
Import Document opens a structured import dialog with:
- a row-based preview of extracted document sections
- multi-row selection
- content modes:
Prefer Secondary TextSecondary Text OnlyPrimary Text OnlyCombine Primary and Secondary Text
From that dialog you can:
- import the selected rows into the main editor
- batch export the selected rows to one
.mp3per item
If logging is enabled in Tools > Settings, the application writes logs to:
data/dynamic/logs/app.log
Log timestamps use:
YYYYMMDDHHmmss
The app writes runtime artifacts under data/dynamic/, including:
app_settings.jsonfor persisted UI settingsaudio_history.jsonfor recent generated audio historyaudio/for default exported audio outputlogs/for application logstmp/for temporary test and scratch artifacts that should not be synced
- The app disables generation-related actions until the editor contains text.
- If multimedia playback support is unavailable in the environment, preview generation still works, but playback controls are disabled and the UI relabels the preview action accordingly.
- Preview audio files are temporary and cleaned up automatically.
- app/main.py: application entrypoint
- app/gui: in-repo Qt UI modules and dialogs
- app/controller: GUI orchestration and workflow logic
- app/model: settings, Azure wrappers, SSML helpers, and scrapers
- docs: supporting documentation
- tests: focused regression tests
The repo currently includes focused regression tests for:
- SSML escaping and advanced SSML markup
- audio history persistence
- document import content-mode resolution
- normalized document scraping for
.txtand.html
Run them with:
python -m unittest tests.test_main_controller_ssml tests.test_main_controller_history tests.test_second_controller_import tests.test_document_scraperThis project is licensed under the MIT License. See LICENSE for details.