Skip to content

SiaLabs/speaker-separation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speaker Separation Tool

A Python script that uses AssemblyAI to separate speakers from an audio file (like a podcast) into individual audio tracks. This is particularly useful for tasks like creating animations with NVIDIA's Audio2Face or analyzing individual speaker contributions.

This project was originally forked from an abandoned repository and has been significantly improved for better accuracy and usability, especially for non-English languages.

Key Features

  • Speaker Diarization: Identifies and separates different speakers in an audio file.
  • Multi-Language Support: Works with various languages supported by AssemblyAI, with improved accuracy for specified languages (e.g., Hindi).
  • Simple Command-Line Interface: Easy to use with just a few command-line arguments.
  • Flexible Output: Generates separate WAV files for each speaker, preserving the original timeline with silence.

How It Works

The script leverages the power of the AssemblyAI API for its core intelligence.

  1. Upload & Transcribe: The audio file is uploaded to AssemblyAI.
  2. Speaker Diarization: AssemblyAI processes the audio to detect who spoke and when, returning precise timestamps for each utterance.
  3. Audio Slicing: The script uses the pydub library to slice the original audio file based on these timestamps.
  4. Export: It creates a separate audio track for each speaker, filling the non-speaking parts with silence to maintain the original timing, and exports them as .wav files.

Getting Started

Prerequisites

Installation

  1. Clone the repository (or download the files):

    # If you are using git
    git clone https://github.com/SiaLabs/speaker-separation.git
    cd speaker-separation
  2. Install the required Python packages:

    pip install -r requirements.txt
  3. Set up your environment variables: Create a file named .env in the project root and add your AssemblyAI API key:

    ASSEMBLYAI_API_KEY="your_api_key_here"
    

Usage

Run the script from your terminal, providing the filename, number of speakers, and optionally the language.

python speaker_separator.py --filename="path/to/your/audio.wav" --numspeakers=2 --language="hi"
  • --filename: The path to your audio file (MP3 or WAV).
  • --numspeakers: The number of speakers in the audio.
  • --language: (Optional) The language code of the audio (e.g., en for English, hi for Hindi). Providing this improves accuracy. See the list of supported languages on the AssemblyAI website.

The output files will be saved in the output/ directory with descriptive names like your_audio_speaker_A.wav and your_audio_speaker_B.wav.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Automatically separate speakers from podcasts and audio, perfect for content created with tools like NotebookLM.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages