A content-based movie recommendation engine that suggests similar movies based on metadata like genres, cast, crew, and keywords — powered by Cosine Similarity and deployed as an interactive Streamlit web app.
🔗 [Coming Soon — Streamlit Deployment]
Ever wondered how Netflix or Spotify knows what to suggest next? This project replicates that core idea using Content-Based Filtering — analyzing what a movie is rather than what users rated it.
Given any movie title, the system finds the Top 5 most similar movies by computing similarity across key metadata features from the TMDB dataset.
- 🔍 Search any movie and get 5 instant recommendations
- 🧠 Content-based filtering using NLP-style feature extraction
- ⚡ Fast results via precomputed similarity matrix (Pickle)
- 🎨 Clean and interactive Streamlit UI
- 🎥 Movie poster fetching via TMDB API
| Tool | Purpose |
|---|---|
| Python | Core language |
| Pandas & NumPy | Data processing |
| Scikit-learn | Vectorization & Cosine Similarity |
| TMDB Dataset | Movie metadata |
| Pickle | Model serialization |
| Streamlit | Web app deployment |
User inputs a movie title
↓
Fetch movie's feature vector (genres + cast + crew + keywords + overview)
↓
Compute Cosine Similarity against all movies in dataset
↓
Return Top 5 most similar movies
- Data Preprocessing — Combined genres, cast, crew, keywords into a single
tagscolumn - Vectorization — Converted tags into numerical vectors using Count Vectorizer
- Similarity Computation — Applied Cosine Similarity to find closest matches
- Serialization — Saved model & similarity matrix using Pickle for fast loading
- Deployment — Built interactive UI with Streamlit
Movie-recommendation-system/
│
├── app.py # Streamlit web app
├── movie_recommendation.ipynb # Model building notebook
├── movies.pkl # Serialized movie data
├── similarity.pkl # Precomputed similarity matrix
├── requirements.txt # Dependencies
└── README.md
# 1. Clone the repository
git clone https://github.com/byteephantom/Movie-recommendation-system.git
cd Movie-recommendation-system
# 2. Install dependencies
pip install -r requirements.txt
# 3. Run the app
streamlit run app.pypandas
numpy
scikit-learn
streamlit
requests
pickle-mixin
- Source: TMDB Movie Dataset — Kaggle
- Size: 5000+ movies
- Features used: title, genres, keywords, cast, crew, overview
Ayush Kumar
- GitHub: @byteephantom
This project is licensed under the MIT License.
⭐ If you found this project helpful, consider giving it a star!