A hybrid music recommendation engine that suggests songs by combining two ML techniques: collaborative filtering (learning from listening patterns) and content-based filtering (analyzing what makes songs sound similar).
Collaborative filtering uses SVD matrix factorization trained with stochastic gradient descent on Last.fm listening data to discover hidden patterns in user preferences. Content-based filtering uses Spotify audio features (energy, tempo, danceability, etc.) with K-Nearest Neighbors to find songs that sound similar to what a user already likes.
- Python >= 3.11
- uv package manager
- Spotify API credentials (client ID and secret from Spotify Developer Dashboard)
# Create virtual environment
python3.12 -m venv .venv
# Activate virtual environment
source .venv/bin/activate
# Install dependencies
uv syncAdd your Spotify API credentials to .env
Look at .env.example to see example keys
python scripts/parse_csv_user_song.py
python scripts/parse_csv_unique_song.pyThese produce:
user_song_interaction.csvunique_song_interaction.csv
python scripts/match_audio_features.pyMatches our Last.fm songs against a Kaggle dataset (data/raw/song_audio_features.csv) containing pre-collected Spotify audio features (energy, tempo, danceability, etc.). Matches by artist + track name.
- As of Feb 2026, Spotify deprecated its song features endpoint
Produces: audio_features.csv
python model/collaborative_filtering.pyTrains SVD matrix factorization with stochastic gradient descent on user-song interactions. Splits data 75/12.5/12.5 train/val/test and prints mean squared error each epoch. Saves the trained model to data/models/svd_model.npz for use by the hybrid recommender.
python model/content_based_filtering.pyLoads audio features, fits KNN with cosine similarity, and generates 10 recommendations per user based on their listening history.
python app/app.pyLaunches a Gradio web UI at http://127.0.0.1:7860 (can also be deployed publically). Search for songs on Spotify, select up to 5, and get recommendations using a two-stage hybrid pipeline:
- KNN candidate generation — finds ~100 sonically similar songs from the dataset
- SVD ranking — re-ranks candidates using collaborative filtering item biases (learned song popularity)
python model/evaluation.pyEvaluates all three models (SVD, KNN, Hybrid) using ranking metrics: Precision@k, Recall@k, and NDCG@k. Generates plots to plots/:
- SVD training curves (train/val MSE per epoch)
- Audio feature distributions and correlation heatmap
- t-SNE embedding of the song feature space
- Model comparison bar chart
A Jupyter notebook with the same analysis is also available at notebooks/evaluation.ipynb.