A machine learning system for analyzing restaurant reviews to extract sentiment, identify service quality issues, and generate actionable business insights. Trained on real customer feedback to help restaurants improve service quality and customer satisfaction.
This project was tested on the Restaurant Reviews Dataset from Kaggle:
🔗 https://www.kaggle.com/datasets/joebeachcapital/restaurant-reviews
⚠️ Note: Download the dataset and save it asRestaurant_reviews.csvin the project root directory before running analysis.
- ✅ Multi-model sentiment analysis (Random Forest, Naive Bayes, SVM)
- ✅ Automated data cleaning for noisy real-world reviews
- ✅ Business intelligence engine identifying key service issues:
- Slow service patterns
- Staff behavior problems
- Food quality complaints
- Ambience issues
- ✅ ROI analysis with investment payback calculations
- ✅ Interactive HTML report with visualizations and recommendations
- ✅ Database integration (MySQL/MariaDB) for production deployment
- ✅ Word frequency analysis to understand customer language patterns
- ✅ Feature importance visualization for model interpretability
| Category | Technologies |
|---|---|
| Core ML | scikit-learn, pandas, numpy |
| NLP | NLTK (stopwords, Porter stemmer), CountVectorizer, TF-IDF |
| Visualization | matplotlib, seaborn, HTML/CSS |
| Database | SQLAlchemy, mysql-connector-python |
| Deployment | Pickle (joblib) for model serialization |
project/
├── Restaurant_reviews.csv # Source dataset (download from Kaggle)
├── ai_model.py # Main analysis script
├── analysis_report.html # Generated HTML report
├── check_data.py # Database data validation
├── clear_all_tables.py # Full database cleanup
├── clear_table.py # Single table cleanup
├── import_data.py # CSV → MySQL importer
├── test_db_connection.py # Database connectivity test
├── results/ # Analysis outputs
│ ├── *.png # Visualizations
│ ├── model_*.pkl # Trained models
│ ├── vectorizer_*.pkl # Text vectorizers
│ ├── models_comparison.csv # Model performance metrics
│ └── business_recommendations.txt # Actionable insights
└── requirements.txt # Python dependencies
# Install MariaDB server
sudo apt update && sudo apt install mariadb-server
# Secure installation
sudo mysql_secure_installation
# Create database and user
sudo mariadb -u root -p
# In MariaDB/MySQL shell:
CREATE DATABASE restaurant_reviews CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
CREATE USER 'project_user'@'localhost' IDENTIFIED BY 'your_password';
GRANT ALL PRIVILEGES ON restaurant_reviews.* TO 'project_user'@'localhost';
FLUSH PRIVILEGES;
EXIT;🔒 Security Note: All database connection files in this repository use your_password as a placeholder. Before running, replace it with your actual password in:
ai_model.pyimport_data.pycheck_data.pytest_db_connection.pyclear_table.pyclear_all_tables.py
Best practice: Use environment variables instead:
import os
db_password = os.environ.get('DB_PASSWORD', 'your_password')# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Basic analysis (CSV output)
python ai_model.py Restaurant_reviews.csv
# With Excel export
python ai_model.py Restaurant_reviews.csv excel# 1. Import data into MySQL
python import_data.py
# 2. Verify data loaded correctly
python check_data.py
# 3. Run full analysis from database
python ai_model.py mysql \
"mysql+mysqlconnector://project_user:your_password@localhost:3306/restaurant_reviews" \
"SELECT review_text AS Review, rating AS Rating, restaurant_name AS Restaurant FROM restaurant_reviews"# Test database connectivity
python test_db_connection.py
# Clear a single table (e.g., model_metrics)
python clear_table.py
# Full database cleanup (all tables)
python clear_all_tables.pyAfter running analysis, you'll get:
analysis_report.html — Interactive dashboard featuring:
- Customer rating distribution
- Review length analysis
- Top-10 restaurants by review volume
- Word clouds for positive/negative sentiment
- Model comparison (accuracy, F1-score, ROC-AUC)
- Confusion matrices and ROC curves
- Business recommendations with ROI calculations
results/ directory containing:
- Trained ML models (
model_RandomForest.pkl, etc.) - Classification reports
- Word frequency statistics
- Feature importance rankings
- Business insights in CSV/Excel format
This system transforms unstructured customer feedback into actionable business intelligence:
| Insight Type | Business Impact |
|---|---|
| Service speed issues | 40% reduction in wait times → higher satisfaction |
| Staff training needs | Targeted training → 25% fewer negative reviews |
| Food quality patterns | Kitchen process improvements → 0.5★ rating increase |
| Ambience optimization | Zone redesign → 15% longer guest stays |
| ROI analysis | 8-month payback period on service improvements |
- Never commit passwords to version control
- Use
.envfiles withpython-dotenv:
from dotenv import load_dotenv
import os
load_dotenv()
password = os.getenv('DB_PASSWORD')- For production deployments, implement proper secrets management (HashiCorp Vault, AWS Secrets Manager, or environment variables)
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for details.
- Dataset source: Kaggle Restaurant Reviews
- NLTK Project for natural language processing resources
- scikit-learn team for robust ML implementations
Before using this project:
- Replace all
your_passwordplaceholders in Python files with your actual database password - Never commit credentials to Git repositories
- For production deployments, implement proper secrets management (HashiCorp Vault, AWS Secrets Manager, or environment variables)