Podcast Transcription CLI

A Python command-line tool that transcribes podcast audio files using OpenAI's gpt-4o-transcribe-diarize model with automatic speaker detection and clean Markdown output.

Features

Automatic speaker diarisation (identifies different speakers as A, B, C, etc.)
Timestamp tracking for each speaker segment
Clean Markdown output format
Download MP3 files from URLs or use local files
Cost estimation before processing
Automatic chunking for large files (>25MB)
Real-time progress bars with ETA for transcription
Detailed status updates and completion times
Smart chunking strategy based on audio duration

Requirements

Python 3.10 or higher (Python 3.13+ supported)
OpenAI API key
FFmpeg (required for audio processing)

Installing FFmpeg

macOS (using Homebrew):

brew install ffmpeg

Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

Windows: Download from ffmpeg.org or use:

choco install ffmpeg

Installation

Clone this repository and navigate to the directory:

cd PodcastTranscriber

Create a virtual environment (recommended):

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Note for Python 3.13+ users: The audioop-lts package is automatically installed to provide compatibility with pydub. If you encounter any audio processing issues, ensure FFmpeg is properly installed on your system.

Set up your OpenAI API key:

cp .env.example .env
# Edit .env and add your API key:
# OPENAI_API_KEY=sk-...

Usage

Basic Usage

Transcribe from URL:

python transcribe.py --url https://example.com/podcast.mp3

Transcribe local file:

python transcribe.py --file ./podcast.mp3

Specify custom output location:

python transcribe.py --url https://example.com/podcast.mp3 --output ./my-transcript.md

Full Options

python transcribe.py [OPTIONS]

Options:
  --url TEXT     URL to MP3 file
  --file TEXT    Path to local audio file
  --output TEXT  Output Markdown file path (default: ./transcript.md)
  --help         Show this message and exit

Note: You must provide either --url or --file, but not both.

Output Format

The tool generates a Markdown file with the following format:

# Podcast Transcript

**Speaker A** [00:00:05 - 00:00:12]
Hi, welcome to the podcast. Today we're discussing transcription technology.

**Speaker B** [00:00:12 - 00:00:20]
Thanks for having me! I'm excited to share what we've been working on.

**Speaker A** [00:00:20 - 00:00:35]
Let's start with the basics. Can you explain how the system works?

Progress Tracking

During transcription, you'll see:

Download progress bar (if using --url) with download speed
Audio duration and cost estimate before processing
Real-time progress bar for each chunk/file being transcribed
- Shows audio duration (e.g., "Transcribing (5.5 min)")
- Displays progress percentage and ETA
- Updates smoothly based on estimated processing time
Completion time for each chunk (e.g., "✓ Completed in 45.2s")

For multi-chunk files, you'll see separate progress bars for each chunk:

Chunk 1/5 (20.0 min)  [################      ]  75%  00:01:23
  ✓ Completed in 85.3s
Chunk 2/5 (20.0 min)  [##########            ]  50%  00:02:15

How It Works

Input Validation: Checks that either a URL or file path is provided
Download: If URL provided, downloads the MP3 file to a temporary location with progress bar
Cost Estimation: Calculates transcription cost based on audio duration ($0.006/minute)
User Confirmation: Asks for confirmation before proceeding
Chunking: If file exceeds 25MB, splits into 20-minute chunks
Transcription: Sends audio to OpenAI's gpt-4o-transcribe-diarize model with live progress tracking
Processing: Converts diarised JSON response to Markdown format
Output: Saves formatted transcript to specified output file

Cost Information

Rate: $0.006 per minute of audio
Example costs:
- 30-minute podcast: ~$0.18
- 1-hour podcast: ~$0.36
- 2-hour podcast: ~$0.72

The tool will calculate and display the estimated cost before processing and ask for confirmation.

Known Limitations

Speaker Labels

Speaker labels (A, B, C, etc.) are generic identifiers
You must manually rename speakers after transcription if you know their identities
Example: Replace "Speaker A" with "Host: John Smith" in the output file

Large Files

Files are automatically chunked if they exceed 25MB
Speaker labels may not be consistent across chunks (e.g., "Speaker A" in chunk 1 might be "Speaker B" in chunk 2)
Manual review and speaker label correction may be needed for chunked files

File Size Limits

Maximum chunk size: 25MB per API request
Automatic chunking into 20-minute segments for larger files

API Limitations

The gpt-4o-transcribe-diarize model does not support custom terminology or prompting
Cannot provide speaker names or context to improve recognition
Transcription accuracy depends on audio quality and speaker clarity

Supported Audio Formats

Primary format: MP3
Other formats supported by FFmpeg/pydub should work but are not extensively tested

Troubleshooting

"OPENAI_API_KEY not found"

Make sure you've created a .env file with your API key:

OPENAI_API_KEY=sk-your-key-here

"FFmpeg not found" or audio processing errors

Install FFmpeg using the instructions in the Requirements section.

"Failed to download file"

Check that the URL is accessible
Verify the URL points to an audio file
Check your internet connection

"File not found"

Verify the file path is correct
Use absolute paths if relative paths don't work
Check file permissions

API Errors

Rate limit: Wait a few moments and try again
Authentication error: Verify your API key is correct
File too large: The tool should automatically chunk large files, but extremely large files may still cause issues

"No module named 'audioop'" (Python 3.13+)

This issue is resolved automatically by the audioop-lts package in requirements.txt. If you still see this error:

pip install audioop-lts

Development

Project Structure

PodcastTranscriber/
├── .env.example          # Example environment variables
├── .gitignore           # Git ignore patterns
├── requirements.txt     # Python dependencies
├── transcribe.py       # Main CLI script
└── README.md           # This file

Running Tests

The tool includes comprehensive error handling for common scenarios:

Invalid file formats
Network errors during download
API errors (rate limits, authentication)
File size validation
Missing API keys

To test manually, try:

Short audio file (<30 seconds)
Medium audio file (1-5 minutes)
Large audio file (>25MB)
URL download vs local file
Invalid inputs (wrong file type, bad URL)

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

This project is open source and available under the MIT License.

Support

For issues or questions:

Check the Troubleshooting section above
Review the Known Limitations
Open an issue on GitHub with details about your problem

Acknowledgments

Uses OpenAI's gpt-4o-transcribe-diarize model for transcription
Built with Click for CLI interface
Audio processing powered by pydub and FFmpeg

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
transcribe.py		transcribe.py

Folders and files

Latest commit

History

Repository files navigation

Podcast Transcription CLI

Features

Requirements

Installing FFmpeg

Installation

Usage

Basic Usage

Full Options

Output Format

Progress Tracking

How It Works

Cost Information

Known Limitations

Speaker Labels

Large Files

File Size Limits

API Limitations

Supported Audio Formats

Troubleshooting

"OPENAI_API_KEY not found"

"FFmpeg not found" or audio processing errors

"Failed to download file"

"File not found"

API Errors

"No module named 'audioop'" (Python 3.13+)

Development

Project Structure

Running Tests

Contributing

License

Support

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages