Skip to content

jangobrecht/PodcastTranscriber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Podcast Transcription CLI

A Python command-line tool that transcribes podcast audio files using OpenAI's gpt-4o-transcribe-diarize model with automatic speaker detection and clean Markdown output.

Features

  • Automatic speaker diarisation (identifies different speakers as A, B, C, etc.)
  • Timestamp tracking for each speaker segment
  • Clean Markdown output format
  • Download MP3 files from URLs or use local files
  • Cost estimation before processing
  • Automatic chunking for large files (>25MB)
  • Real-time progress bars with ETA for transcription
  • Detailed status updates and completion times
  • Smart chunking strategy based on audio duration

Requirements

  • Python 3.10 or higher (Python 3.13+ supported)
  • OpenAI API key
  • FFmpeg (required for audio processing)

Installing FFmpeg

macOS (using Homebrew):

brew install ffmpeg

Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

Windows: Download from ffmpeg.org or use:

choco install ffmpeg

Installation

  1. Clone this repository and navigate to the directory:
cd PodcastTranscriber
  1. Create a virtual environment (recommended):
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Note for Python 3.13+ users: The audioop-lts package is automatically installed to provide compatibility with pydub. If you encounter any audio processing issues, ensure FFmpeg is properly installed on your system.

  1. Set up your OpenAI API key:
cp .env.example .env
# Edit .env and add your API key:
# OPENAI_API_KEY=sk-...

Usage

Basic Usage

Transcribe from URL:

python transcribe.py --url https://example.com/podcast.mp3

Transcribe local file:

python transcribe.py --file ./podcast.mp3

Specify custom output location:

python transcribe.py --url https://example.com/podcast.mp3 --output ./my-transcript.md

Full Options

python transcribe.py [OPTIONS]

Options:
  --url TEXT     URL to MP3 file
  --file TEXT    Path to local audio file
  --output TEXT  Output Markdown file path (default: ./transcript.md)
  --help         Show this message and exit

Note: You must provide either --url or --file, but not both.

Output Format

The tool generates a Markdown file with the following format:

# Podcast Transcript

**Speaker A** [00:00:05 - 00:00:12]
Hi, welcome to the podcast. Today we're discussing transcription technology.

**Speaker B** [00:00:12 - 00:00:20]
Thanks for having me! I'm excited to share what we've been working on.

**Speaker A** [00:00:20 - 00:00:35]
Let's start with the basics. Can you explain how the system works?

Progress Tracking

During transcription, you'll see:

  • Download progress bar (if using --url) with download speed
  • Audio duration and cost estimate before processing
  • Real-time progress bar for each chunk/file being transcribed
    • Shows audio duration (e.g., "Transcribing (5.5 min)")
    • Displays progress percentage and ETA
    • Updates smoothly based on estimated processing time
  • Completion time for each chunk (e.g., "✓ Completed in 45.2s")

For multi-chunk files, you'll see separate progress bars for each chunk:

Chunk 1/5 (20.0 min)  [################      ]  75%  00:01:23
  ✓ Completed in 85.3s
Chunk 2/5 (20.0 min)  [##########            ]  50%  00:02:15

How It Works

  1. Input Validation: Checks that either a URL or file path is provided
  2. Download: If URL provided, downloads the MP3 file to a temporary location with progress bar
  3. Cost Estimation: Calculates transcription cost based on audio duration ($0.006/minute)
  4. User Confirmation: Asks for confirmation before proceeding
  5. Chunking: If file exceeds 25MB, splits into 20-minute chunks
  6. Transcription: Sends audio to OpenAI's gpt-4o-transcribe-diarize model with live progress tracking
  7. Processing: Converts diarised JSON response to Markdown format
  8. Output: Saves formatted transcript to specified output file

Cost Information

  • Rate: $0.006 per minute of audio
  • Example costs:
    • 30-minute podcast: ~$0.18
    • 1-hour podcast: ~$0.36
    • 2-hour podcast: ~$0.72

The tool will calculate and display the estimated cost before processing and ask for confirmation.

Known Limitations

Speaker Labels

  • Speaker labels (A, B, C, etc.) are generic identifiers
  • You must manually rename speakers after transcription if you know their identities
  • Example: Replace "Speaker A" with "Host: John Smith" in the output file

Large Files

  • Files are automatically chunked if they exceed 25MB
  • Speaker labels may not be consistent across chunks (e.g., "Speaker A" in chunk 1 might be "Speaker B" in chunk 2)
  • Manual review and speaker label correction may be needed for chunked files

File Size Limits

  • Maximum chunk size: 25MB per API request
  • Automatic chunking into 20-minute segments for larger files

API Limitations

  • The gpt-4o-transcribe-diarize model does not support custom terminology or prompting
  • Cannot provide speaker names or context to improve recognition
  • Transcription accuracy depends on audio quality and speaker clarity

Supported Audio Formats

  • Primary format: MP3
  • Other formats supported by FFmpeg/pydub should work but are not extensively tested

Troubleshooting

"OPENAI_API_KEY not found"

Make sure you've created a .env file with your API key:

OPENAI_API_KEY=sk-your-key-here

"FFmpeg not found" or audio processing errors

Install FFmpeg using the instructions in the Requirements section.

"Failed to download file"

  • Check that the URL is accessible
  • Verify the URL points to an audio file
  • Check your internet connection

"File not found"

  • Verify the file path is correct
  • Use absolute paths if relative paths don't work
  • Check file permissions

API Errors

  • Rate limit: Wait a few moments and try again
  • Authentication error: Verify your API key is correct
  • File too large: The tool should automatically chunk large files, but extremely large files may still cause issues

"No module named 'audioop'" (Python 3.13+)

This issue is resolved automatically by the audioop-lts package in requirements.txt. If you still see this error:

pip install audioop-lts

Development

Project Structure

PodcastTranscriber/
├── .env.example          # Example environment variables
├── .gitignore           # Git ignore patterns
├── requirements.txt     # Python dependencies
├── transcribe.py       # Main CLI script
└── README.md           # This file

Running Tests

The tool includes comprehensive error handling for common scenarios:

  • Invalid file formats
  • Network errors during download
  • API errors (rate limits, authentication)
  • File size validation
  • Missing API keys

To test manually, try:

  1. Short audio file (<30 seconds)
  2. Medium audio file (1-5 minutes)
  3. Large audio file (>25MB)
  4. URL download vs local file
  5. Invalid inputs (wrong file type, bad URL)

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

This project is open source and available under the MIT License.

Support

For issues or questions:

  1. Check the Troubleshooting section above
  2. Review the Known Limitations
  3. Open an issue on GitHub with details about your problem

Acknowledgments

  • Uses OpenAI's gpt-4o-transcribe-diarize model for transcription
  • Built with Click for CLI interface
  • Audio processing powered by pydub and FFmpeg

About

A Python CLI tool for transcribing podcasts with automatic speaker diarisation using OpenAI's Whisper API. Features real-time progress tracking, cost estimation, automatic chunking for large files, and clean Markdown output with timestamps. Supports both URLs and local files. Python 3.10+ compatible.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages