TextToSpeechPython

Overview

TextToSpeechPython is a PyQt6 desktop application for building Azure Speech text-to-speech workflows around plain text, SSML previewing, and multi-format document imports.

The current application supports:

editing or pasting source text in the main window
live SSML preview generation
Azure Speech synthesis to preview or exported .mp3 files
a settings dialog for Azure credentials, voice, output directory, logging, playback volume, and advanced SSML controls
document import for .txt, .docx, .pdf, .html, .htm, .rtf, .epub, .xlsx, .xls, .csv, and .pptx
selective import of extracted document sections into the main editor
batch export of selected imported rows to one audio file per item
recent audio history in the main window

Requirements

Python 3.11+
Poetry
Azure Speech resource with a valid key and region

Installation

From the project root:

poetry install

Running The App

You can launch the application in either of these ways:

poetry run python -m app.main

or:

poetry run tts-app

Azure Configuration

The app can start without Azure configured, but generation and export will stay unavailable until credentials are provided.

You can configure Azure Speech in either of these ways:

In the GUI via Tools > Settings
With a local .env file in INI format

Example .env:

[API]
key = YOUR_AZURE_SPEECH_KEY
region = YOUR_AZURE_REGION

GUI settings take precedence over .env when both are present.

Main Workflow

Paste text into the editor, or import content from Import Document.
Open Tools > Settings to configure Azure, voice, output directory, and SSML options.
Review the generated SSML preview.
Use Generate & Play for a temporary preview file or Generate File to export an .mp3.
Review recent generated files in the Recent Audio panel.

Advanced SSML Controls

The settings dialog includes an expandable advanced SSML section. The current GUI exposes:

emphasis
pitch
pitch range
pause duration
pause position

These controls are applied to both the SSML preview and generated audio.

Document Import Workflow

Import Document opens a structured import dialog with:

a row-based preview of extracted document sections
multi-row selection
content modes:
- Prefer Secondary Text
- Secondary Text Only
- Primary Text Only
- Combine Primary and Secondary Text

From that dialog you can:

import the selected rows into the main editor
batch export the selected rows to one .mp3 per item

Logging

If logging is enabled in Tools > Settings, the application writes logs to:

data/dynamic/logs/app.log

Log timestamps use:

YYYYMMDDHHmmss

Runtime Data

The app writes runtime artifacts under data/dynamic/, including:

app_settings.json for persisted UI settings
audio_history.json for recent generated audio history
audio/ for default exported audio output
logs/ for application logs
tmp/ for temporary test and scratch artifacts that should not be synced

Current Behavior Notes

The app disables generation-related actions until the editor contains text.
If multimedia playback support is unavailable in the environment, preview generation still works, but playback controls are disabled and the UI relabels the preview action accordingly.
Preview audio files are temporary and cleaned up automatically.

Project Layout

app/main.py: application entrypoint
app/gui: in-repo Qt UI modules and dialogs
app/controller: GUI orchestration and workflow logic
app/model: settings, Azure wrappers, SSML helpers, and scrapers
docs: supporting documentation
tests: focused regression tests

Verification

The repo currently includes focused regression tests for:

SSML escaping and advanced SSML markup
audio history persistence
document import content-mode resolution
normalized document scraping for .txt and .html

Run them with:

python -m unittest tests.test_main_controller_ssml tests.test_main_controller_history tests.test_second_controller_import tests.test_document_scraper

License

This project is licensed under the MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TextToSpeechPython

Overview

Requirements

Installation

Running The App

Azure Configuration

Main Workflow

Advanced SSML Controls

Document Import Workflow

Logging

Runtime Data

Current Behavior Notes

Project Layout

Verification

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

TextToSpeechPython

Overview

Requirements

Installation

Running The App

Azure Configuration

Main Workflow

Advanced SSML Controls

Document Import Workflow

Logging

Runtime Data

Current Behavior Notes

Project Layout

Verification

License