Skip to content

hariV0078/TTS_Chatterbox

Repository files navigation

title Arivara TTS
emoji 🍿
colorFrom indigo
colorTo blue
sdk gradio
sdk_version 5.29.0
app_file app.py
pinned false
short_description Expressive Zeroshot TTS

Arivara Voice-TTS Demo (Chatterbox)

Arivara Voice-TTS is a powerful Text-to-Speech (TTS) application that generates high-quality speech from text with reference audio styling. It utilizes the ArivaraTTS model to provide expressive, zero-shot voice cloning and synthesis through a user-friendly Gradio interface.

✨ Features

  • Expressive Synthesis: Generate natural-sounding speech from text (up to 3000 characters).
  • Voice Styling (Zero-Shot): Upload a reference audio file to instantly capture a speaker's voice characteristics, prosody, and tone without any fine-tuning.
  • Advanced Generation Controls:
    • Exaggeration: Control the speech expressiveness (0.25 to 2.0).
    • CFG / Pace Weight: Adjust the generation guidance and pacing of the speech.
    • Temperature: Control randomness and variance in generation.
    • Seed: Ensure reproducibility for specific voice outputs.
  • GPU Acceleration: Automatically detects and leverages CUDA device for faster generations if available.

🚀 Setup & Installation

1. Clone the Repository

git clone https://github.com/hariV0078/TTS_Chatterbox.git
cd TTS_Chatterbox

2. Create a Virtual Environment (Recommended)

To avoid conflicts with other global packages, it's highly recommended to use a virtual environment:

On Windows:

python -m venv venv
venv\Scripts\activate

On Unix or macOS:

python -m venv venv
source venv/bin/activate

3. Install Requirements

Install the necessary dependencies including PyTorch, Gradio, and Transformers:

pip install -r requirements.txt

(Note: Depending on your system and GPU, you may need to install a specific version of PyTorch with CUDA support from the official PyTorch website.)

🛠️ Usage

Running Locally

Start the application by running the main Python script:

python app.py

This will initialize the ArivaraTTS model and start the Gradio local web server. You will see a local URL in your console (usually http://127.0.0.1:7860). Open that link in your browser to interact with the UI.

Using the Interface

  1. Text: Enter the sentence or paragraph you wish to synthesize.
  2. Reference Audio File: Provide a .flac, .wav, or .mp3 file of the voice you want to mimic. You can use local paths or accessible web URLs.
  3. Advanced Parameters: Toggle "More options" to fine-tune Exaggeration, Pace, Seed, and Temperature to get the perfect speech output.
  4. Click Generate and listen to the Voice-TTS output result!

⚙️ Hugging Face Integration

This project is configured properly to be hosted as a Hugging Face Space using the Gradio SDK. Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages