A Python command-line tool that transcribes podcast audio files using OpenAI's gpt-4o-transcribe-diarize model with automatic speaker detection and clean Markdown output.
- Automatic speaker diarisation (identifies different speakers as A, B, C, etc.)
- Timestamp tracking for each speaker segment
- Clean Markdown output format
- Download MP3 files from URLs or use local files
- Cost estimation before processing
- Automatic chunking for large files (>25MB)
- Real-time progress bars with ETA for transcription
- Detailed status updates and completion times
- Smart chunking strategy based on audio duration
- Python 3.10 or higher (Python 3.13+ supported)
- OpenAI API key
- FFmpeg (required for audio processing)
macOS (using Homebrew):
brew install ffmpegUbuntu/Debian:
sudo apt update
sudo apt install ffmpegWindows: Download from ffmpeg.org or use:
choco install ffmpeg- Clone this repository and navigate to the directory:
cd PodcastTranscriber- Create a virtual environment (recommended):
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtNote for Python 3.13+ users: The
audioop-ltspackage is automatically installed to provide compatibility with pydub. If you encounter any audio processing issues, ensure FFmpeg is properly installed on your system.
- Set up your OpenAI API key:
cp .env.example .env
# Edit .env and add your API key:
# OPENAI_API_KEY=sk-...Transcribe from URL:
python transcribe.py --url https://example.com/podcast.mp3Transcribe local file:
python transcribe.py --file ./podcast.mp3Specify custom output location:
python transcribe.py --url https://example.com/podcast.mp3 --output ./my-transcript.mdpython transcribe.py [OPTIONS]
Options:
--url TEXT URL to MP3 file
--file TEXT Path to local audio file
--output TEXT Output Markdown file path (default: ./transcript.md)
--help Show this message and exitNote: You must provide either --url or --file, but not both.
The tool generates a Markdown file with the following format:
# Podcast Transcript
**Speaker A** [00:00:05 - 00:00:12]
Hi, welcome to the podcast. Today we're discussing transcription technology.
**Speaker B** [00:00:12 - 00:00:20]
Thanks for having me! I'm excited to share what we've been working on.
**Speaker A** [00:00:20 - 00:00:35]
Let's start with the basics. Can you explain how the system works?During transcription, you'll see:
- Download progress bar (if using --url) with download speed
- Audio duration and cost estimate before processing
- Real-time progress bar for each chunk/file being transcribed
- Shows audio duration (e.g., "Transcribing (5.5 min)")
- Displays progress percentage and ETA
- Updates smoothly based on estimated processing time
- Completion time for each chunk (e.g., "✓ Completed in 45.2s")
For multi-chunk files, you'll see separate progress bars for each chunk:
Chunk 1/5 (20.0 min) [################ ] 75% 00:01:23
✓ Completed in 85.3s
Chunk 2/5 (20.0 min) [########## ] 50% 00:02:15
- Input Validation: Checks that either a URL or file path is provided
- Download: If URL provided, downloads the MP3 file to a temporary location with progress bar
- Cost Estimation: Calculates transcription cost based on audio duration ($0.006/minute)
- User Confirmation: Asks for confirmation before proceeding
- Chunking: If file exceeds 25MB, splits into 20-minute chunks
- Transcription: Sends audio to OpenAI's
gpt-4o-transcribe-diarizemodel with live progress tracking - Processing: Converts diarised JSON response to Markdown format
- Output: Saves formatted transcript to specified output file
- Rate: $0.006 per minute of audio
- Example costs:
- 30-minute podcast: ~$0.18
- 1-hour podcast: ~$0.36
- 2-hour podcast: ~$0.72
The tool will calculate and display the estimated cost before processing and ask for confirmation.
- Speaker labels (A, B, C, etc.) are generic identifiers
- You must manually rename speakers after transcription if you know their identities
- Example: Replace "Speaker A" with "Host: John Smith" in the output file
- Files are automatically chunked if they exceed 25MB
- Speaker labels may not be consistent across chunks (e.g., "Speaker A" in chunk 1 might be "Speaker B" in chunk 2)
- Manual review and speaker label correction may be needed for chunked files
- Maximum chunk size: 25MB per API request
- Automatic chunking into 20-minute segments for larger files
- The
gpt-4o-transcribe-diarizemodel does not support custom terminology or prompting - Cannot provide speaker names or context to improve recognition
- Transcription accuracy depends on audio quality and speaker clarity
- Primary format: MP3
- Other formats supported by FFmpeg/pydub should work but are not extensively tested
Make sure you've created a .env file with your API key:
OPENAI_API_KEY=sk-your-key-hereInstall FFmpeg using the instructions in the Requirements section.
- Check that the URL is accessible
- Verify the URL points to an audio file
- Check your internet connection
- Verify the file path is correct
- Use absolute paths if relative paths don't work
- Check file permissions
- Rate limit: Wait a few moments and try again
- Authentication error: Verify your API key is correct
- File too large: The tool should automatically chunk large files, but extremely large files may still cause issues
This issue is resolved automatically by the audioop-lts package in requirements.txt. If you still see this error:
pip install audioop-ltsPodcastTranscriber/
├── .env.example # Example environment variables
├── .gitignore # Git ignore patterns
├── requirements.txt # Python dependencies
├── transcribe.py # Main CLI script
└── README.md # This file
The tool includes comprehensive error handling for common scenarios:
- Invalid file formats
- Network errors during download
- API errors (rate limits, authentication)
- File size validation
- Missing API keys
To test manually, try:
- Short audio file (<30 seconds)
- Medium audio file (1-5 minutes)
- Large audio file (>25MB)
- URL download vs local file
- Invalid inputs (wrong file type, bad URL)
Contributions are welcome! Please feel free to submit issues or pull requests.
This project is open source and available under the MIT License.
For issues or questions:
- Check the Troubleshooting section above
- Review the Known Limitations
- Open an issue on GitHub with details about your problem
- Uses OpenAI's
gpt-4o-transcribe-diarizemodel for transcription - Built with Click for CLI interface
- Audio processing powered by pydub and FFmpeg