Skip to content

Latest commit

 

History

History
166 lines (115 loc) · 4.48 KB

File metadata and controls

166 lines (115 loc) · 4.48 KB

TextToSpeechPython

Overview

TextToSpeechPython is a PyQt6 desktop application for building Azure Speech text-to-speech workflows around plain text, SSML previewing, and multi-format document imports.

The current application supports:

  • editing or pasting source text in the main window
  • live SSML preview generation
  • Azure Speech synthesis to preview or exported .mp3 files
  • a settings dialog for Azure credentials, voice, output directory, logging, playback volume, and advanced SSML controls
  • document import for .txt, .docx, .pdf, .html, .htm, .rtf, .epub, .xlsx, .xls, .csv, and .pptx
  • selective import of extracted document sections into the main editor
  • batch export of selected imported rows to one audio file per item
  • recent audio history in the main window

Requirements

  • Python 3.11+
  • Poetry
  • Azure Speech resource with a valid key and region

Installation

From the project root:

poetry install

Running The App

You can launch the application in either of these ways:

poetry run python -m app.main

or:

poetry run tts-app

Azure Configuration

The app can start without Azure configured, but generation and export will stay unavailable until credentials are provided.

You can configure Azure Speech in either of these ways:

  1. In the GUI via Tools > Settings
  2. With a local .env file in INI format

Example .env:

[API]
key = YOUR_AZURE_SPEECH_KEY
region = YOUR_AZURE_REGION

GUI settings take precedence over .env when both are present.

Main Workflow

  1. Paste text into the editor, or import content from Import Document.
  2. Open Tools > Settings to configure Azure, voice, output directory, and SSML options.
  3. Review the generated SSML preview.
  4. Use Generate & Play for a temporary preview file or Generate File to export an .mp3.
  5. Review recent generated files in the Recent Audio panel.

Advanced SSML Controls

The settings dialog includes an expandable advanced SSML section. The current GUI exposes:

  • emphasis
  • pitch
  • pitch range
  • pause duration
  • pause position

These controls are applied to both the SSML preview and generated audio.

Document Import Workflow

Import Document opens a structured import dialog with:

  • a row-based preview of extracted document sections
  • multi-row selection
  • content modes:
    • Prefer Secondary Text
    • Secondary Text Only
    • Primary Text Only
    • Combine Primary and Secondary Text

From that dialog you can:

  • import the selected rows into the main editor
  • batch export the selected rows to one .mp3 per item

Logging

If logging is enabled in Tools > Settings, the application writes logs to:

data/dynamic/logs/app.log

Log timestamps use:

YYYYMMDDHHmmss

Runtime Data

The app writes runtime artifacts under data/dynamic/, including:

  • app_settings.json for persisted UI settings
  • audio_history.json for recent generated audio history
  • audio/ for default exported audio output
  • logs/ for application logs
  • tmp/ for temporary test and scratch artifacts that should not be synced

Current Behavior Notes

  • The app disables generation-related actions until the editor contains text.
  • If multimedia playback support is unavailable in the environment, preview generation still works, but playback controls are disabled and the UI relabels the preview action accordingly.
  • Preview audio files are temporary and cleaned up automatically.

Project Layout

  • app/main.py: application entrypoint
  • app/gui: in-repo Qt UI modules and dialogs
  • app/controller: GUI orchestration and workflow logic
  • app/model: settings, Azure wrappers, SSML helpers, and scrapers
  • docs: supporting documentation
  • tests: focused regression tests

Verification

The repo currently includes focused regression tests for:

  • SSML escaping and advanced SSML markup
  • audio history persistence
  • document import content-mode resolution
  • normalized document scraping for .txt and .html

Run them with:

python -m unittest tests.test_main_controller_ssml tests.test_main_controller_history tests.test_second_controller_import tests.test_document_scraper

License

This project is licensed under the MIT License. See LICENSE for details.