Skip to content

V3kt0r39/ml-sentiment-analysis

Repository files navigation

🍽️ Restaurant Review Sentiment Analysis System

Python License Dataset

A machine learning system for analyzing restaurant reviews to extract sentiment, identify service quality issues, and generate actionable business insights. Trained on real customer feedback to help restaurants improve service quality and customer satisfaction.

📊 Dataset

This project was tested on the Restaurant Reviews Dataset from Kaggle:

🔗 https://www.kaggle.com/datasets/joebeachcapital/restaurant-reviews

⚠️ Note: Download the dataset and save it as Restaurant_reviews.csv in the project root directory before running analysis.

✨ Features

  • Multi-model sentiment analysis (Random Forest, Naive Bayes, SVM)
  • Automated data cleaning for noisy real-world reviews
  • Business intelligence engine identifying key service issues:
    • Slow service patterns
    • Staff behavior problems
    • Food quality complaints
    • Ambience issues
  • ROI analysis with investment payback calculations
  • Interactive HTML report with visualizations and recommendations
  • Database integration (MySQL/MariaDB) for production deployment
  • Word frequency analysis to understand customer language patterns
  • Feature importance visualization for model interpretability

🛠️ Technologies Used

Category Technologies
Core ML scikit-learn, pandas, numpy
NLP NLTK (stopwords, Porter stemmer), CountVectorizer, TF-IDF
Visualization matplotlib, seaborn, HTML/CSS
Database SQLAlchemy, mysql-connector-python
Deployment Pickle (joblib) for model serialization

📁 Project Structure

project/
├── Restaurant_reviews.csv          # Source dataset (download from Kaggle)
├── ai_model.py                     # Main analysis script
├── analysis_report.html            # Generated HTML report
├── check_data.py                   # Database data validation
├── clear_all_tables.py             # Full database cleanup
├── clear_table.py                  # Single table cleanup
├── import_data.py                  # CSV → MySQL importer
├── test_db_connection.py           # Database connectivity test
├── results/                        # Analysis outputs
│   ├── *.png                       # Visualizations
│   ├── model_*.pkl                 # Trained models
│   ├── vectorizer_*.pkl            # Text vectorizers
│   ├── models_comparison.csv       # Model performance metrics
│   └── business_recommendations.txt # Actionable insights
└── requirements.txt                # Python dependencies

⚙️ Installation & Setup

1. Database Setup (MariaDB/MySQL)

# Install MariaDB server
sudo apt update && sudo apt install mariadb-server

# Secure installation
sudo mysql_secure_installation

# Create database and user
sudo mariadb -u root -p

# In MariaDB/MySQL shell:
CREATE DATABASE restaurant_reviews CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
CREATE USER 'project_user'@'localhost' IDENTIFIED BY 'your_password';
GRANT ALL PRIVILEGES ON restaurant_reviews.* TO 'project_user'@'localhost';
FLUSH PRIVILEGES;
EXIT;

🔒 Security Note: All database connection files in this repository use your_password as a placeholder. Before running, replace it with your actual password in:

  • ai_model.py
  • import_data.py
  • check_data.py
  • test_db_connection.py
  • clear_table.py
  • clear_all_tables.py

Best practice: Use environment variables instead:

import os
db_password = os.environ.get('DB_PASSWORD', 'your_password')

2. Python Environment

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

▶️ Usage

Option 1: Analyze CSV file directly

# Basic analysis (CSV output)
python ai_model.py Restaurant_reviews.csv

# With Excel export
python ai_model.py Restaurant_reviews.csv excel

Option 2: Database workflow (recommended for production)

# 1. Import data into MySQL
python import_data.py

# 2. Verify data loaded correctly
python check_data.py

# 3. Run full analysis from database
python ai_model.py mysql \
  "mysql+mysqlconnector://project_user:your_password@localhost:3306/restaurant_reviews" \
  "SELECT review_text AS Review, rating AS Rating, restaurant_name AS Restaurant FROM restaurant_reviews"

Option 3: Database maintenance scripts

# Test database connectivity
python test_db_connection.py

# Clear a single table (e.g., model_metrics)
python clear_table.py

# Full database cleanup (all tables)
python clear_all_tables.py

📈 Sample Output

After running analysis, you'll get:

analysis_report.html — Interactive dashboard featuring:

  • Customer rating distribution
  • Review length analysis
  • Top-10 restaurants by review volume
  • Word clouds for positive/negative sentiment
  • Model comparison (accuracy, F1-score, ROC-AUC)
  • Confusion matrices and ROC curves
  • Business recommendations with ROI calculations

results/ directory containing:

  • Trained ML models (model_RandomForest.pkl, etc.)
  • Classification reports
  • Word frequency statistics
  • Feature importance rankings
  • Business insights in CSV/Excel format

💼 Business Value

This system transforms unstructured customer feedback into actionable business intelligence:

Insight Type Business Impact
Service speed issues 40% reduction in wait times → higher satisfaction
Staff training needs Targeted training → 25% fewer negative reviews
Food quality patterns Kitchen process improvements → 0.5★ rating increase
Ambience optimization Zone redesign → 15% longer guest stays
ROI analysis 8-month payback period on service improvements

🔒 Security Recommendations

  1. Never commit passwords to version control
  2. Use .env files with python-dotenv:
from dotenv import load_dotenv
import os
load_dotenv()
password = os.getenv('DB_PASSWORD')
  1. For production deployments, implement proper secrets management (HashiCorp Vault, AWS Secrets Manager, or environment variables)

🤝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📜 License

Distributed under the MIT License. See LICENSE for details.

🙏 Acknowledgements

  • Dataset source: Kaggle Restaurant Reviews
  • NLTK Project for natural language processing resources
  • scikit-learn team for robust ML implementations

🔑 Critical Security Reminder

Before using this project:

  1. Replace all your_password placeholders in Python files with your actual database password
  2. Never commit credentials to Git repositories
  3. For production deployments, implement proper secrets management (HashiCorp Vault, AWS Secrets Manager, or environment variables)

About

ML/NLP system for restaurant review sentiment analysis, service issue detection & business insights. Python • scikit-learn • MySQL • ROI optimization

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages