Music Whisper Transcriber

A simple Python tool to transcribe song lyrics and vocals from various audio files using OpenAI's Whisper. Perfect for when you can't quite make out what the singer is saying!

What it does

This tool helps you extract lyrics and vocals from music files by:

Using OpenAI's Whisper AI model for accurate speech recognition
Processing various audio formats (MP3, WAV, FLAC, M4A, etc.)
Running completely offline - no internet connection required
Providing clean, readable text output of the transcribed vocals

It is not always 100% accurate though :/

Setup

Clone or download this repository:

git clone https://github.com/rkruk/Whisper-Transcriber.git
cd Whisper-Transcriber

Create a Python virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install --upgrade pip
pip install openai-whisper

Usage

Basic usage:

python transcribe.py path/to/your-song.mp3

Examples:

python transcribe.py ~/Music/song.mp3
python transcribe.py "/path/to/My Favorite Song.wav"

The transcription will be printed to your terminal. You can redirect it to a file if needed:

python transcribe.py song.mp3 > lyrics.txt

Whisper Models

The tool uses the "base" model by default, which provides a good balance of speed and accuracy. You can modify transcribe.py to use different models:

tiny: Fastest, least accurate (~39 MB)
base: Good balance (default) (~74 MB)
small: Better accuracy (~244 MB)
medium: Even better (~769 MB)
large: Best accuracy (~1550 MB)

Tips for Best Results

Clean audio works better: Studio recordings typically transcribe more accurately than live performances
Instrumental sections: The tool may output "[Music]" or attempt to transcribe background vocals during instrumental parts
Multiple languages: The tool is set to English by default, but Whisper supports many languages
File formats: Works with MP3, WAV, FLAC, M4A, and most common audio formats

Troubleshooting

If you get import errors: Make sure you're in your virtual environment:

source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate     # Windows

For very long songs: The transcription might take a few minutes for longer tracks. This is normal.

Poor transcription quality: Try using a higher quality model by editing transcribe.py and changing "base" to "small" or "medium".

System Requirements

Python 3.8+
At least 1GB free RAM (more for larger models)
Works on Linux, macOS, and Windows

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music Whisper Transcriber

What it does

Setup

Usage

Whisper Models

Tips for Best Results

Troubleshooting

System Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Music Whisper Transcriber

What it does

Setup

Usage

Whisper Models

Tips for Best Results

Troubleshooting

System Requirements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages