A comprehensive, production-ready comparison of classical ML, deep learning, and transformer-based approaches for binary sentiment classification on the IMDB Movie Reviews dataset.
📊 Results • 🚀 Quick Start • 📁 Project Structure • 📓 Notebooks • 🌐 Demo
- Overview
- Dataset
- Models Compared
- Results
- Project Structure
- Quick Start
- Installation
- Usage
- Error Analysis
- Class Imbalance Handling
- Live Demo
- Contributing
- License
This project benchmarks three sentiment analysis approaches across the IMDB 50K Movie Reviews dataset:
| Approach | Type | Library |
|---|---|---|
| TF-IDF + Logistic Regression | Classical ML | Scikit-learn |
| LSTM | Deep Learning (RNN) | PyTorch |
BERT (bert-base-uncased) |
Transformer | 🤗 HuggingFace |
Key highlights:
- ✅ Full error analysis with misclassified sample inspection
- ✅ Class imbalance handled via class weights + SMOTE
- ✅ Confusion matrix, F1-score, ROC-AUC for every model
- ✅ Fully reproducible Jupyter Notebooks
- ✅ Interactive Gradio web demo
| Property | Value |
|---|---|
| Source | Kaggle / HuggingFace Datasets |
| Size | 50,000 reviews |
| Classes | Positive / Negative (Binary) |
| Balance | 25,000 positive + 25,000 negative |
| Split | 80% Train / 10% Val / 10% Test |
Option A — Via Kaggle (Recommended)
- Go to https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
- Click Download
- Place
IMDB Dataset.csvinside thedata/raw/folder
Option B — Via Kaggle CLI (Automated)
pip install kaggle
# Place your kaggle.json API key in ~/.kaggle/
kaggle datasets download -d lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
unzip imdb-dataset-of-50k-movie-reviews.zip -d data/raw/Option C — Via HuggingFace Datasets (No download needed)
from datasets import load_dataset
dataset = load_dataset("imdb")💡 The notebooks auto-detect which method to use — just run them!
- Feature extraction: TF-IDF with unigrams + bigrams (max 50,000 features)
- Model: Logistic Regression with L2 regularization
- Class imbalance:
class_weight='balanced' - Pros: Extremely fast, interpretable, strong baseline
- Cons: Loses word order and context
- Embedding: Pretrained GloVe 100d embeddings
- Architecture: Bidirectional LSTM (128 hidden units) → Dropout(0.5) → FC → Sigmoid
- Class imbalance: Weighted
BCELoss - Training: Adam optimizer, early stopping
- Pros: Captures sequential patterns
- Cons: Slower than TF-IDF, weaker than BERT on long text
- Model:
bert-base-uncasedfrom HuggingFace Transformers - Fine-tuning: Last 4 transformer layers + classification head
- Class imbalance: Weighted cross-entropy loss
- Training: AdamW optimizer, linear warmup schedule, 3 epochs
- Pros: State-of-the-art contextual understanding
- Cons: Computationally expensive (GPU recommended)
| Model | Accuracy | Precision | Recall | F1-Score | ROC-AUC |
|---|---|---|---|---|---|
| TF-IDF + Log. Reg. | 89.4% | 89.2% | 89.4% | 89.3% | 0.964 |
| Bi-LSTM | 91.8% | 91.7% | 91.8% | 91.7% | 0.972 |
| BERT | 94.1% | 94.0% | 94.1% | 94.0% | 0.988 |
📌 Results are on the held-out test set. Full metrics and confusion matrices are in the notebooks.
See notebooks/04_comparison_report.ipynb for full visualizations.
Sentiment_Analysis/
│
├── 📁 data/
│ ├── raw/ # Raw IMDB dataset (.csv)
│ └── processed/ # Cleaned, split datasets
│
├── 📁 notebooks/
│ ├── 01_EDA_preprocessing.ipynb # Exploratory Data Analysis
│ ├── 02_tfidf_logistic_regression.ipynb # TF-IDF + LR model
│ ├── 03_lstm_model.ipynb # LSTM model
│ ├── 04_bert_model.ipynb # BERT fine-tuning
│ └── 05_comparison_report.ipynb # Side-by-side comparison
│
├── 📁 src/
│ ├── preprocess.py # Text cleaning & preprocessing
│ ├── tfidf_model.py # TF-IDF + LR pipeline
│ ├── lstm_model.py # LSTM architecture
│ ├── bert_model.py # BERT fine-tuning code
│ ├── evaluate.py # Metrics, confusion matrix, error analysis
│ └── utils.py # Helper functions
│
├── 📁 models/
│ ├── tfidf_vectorizer.pkl # Saved TF-IDF vectorizer
│ ├── lr_model.pkl # Saved Logistic Regression model
│ ├── lstm_model.pth # Saved LSTM weights
│ └── bert_finetuned/ # Saved BERT model (HuggingFace format)
│
├── 📁 results/
│ ├── confusion_matrices/ # PNG outputs
│ ├── metrics_summary.csv # All model metrics
│ └── error_analysis.csv # Misclassified samples
│
├── 📁 app/
│ └── demo.py # Gradio web demo
│
├── requirements.txt
├── environment.yml # Conda environment
├── setup.py
├── .gitignore
└── README.md
# 1. Clone the repo
git clone https://github.com/najahaja/Sentiment-Analysis.git
cd Sentiment-Analysis
# 2. Create conda environment
conda env create -f environment.yml
conda activate sentiment-env
# OR use pip
pip install -r requirements.txt
# 3. Download dataset (auto via HuggingFace — no Kaggle account needed)
python src/utils.py --download
# 4. Run all notebooks in order, OR run the full pipeline:
python src/train_all.py
# 5. Launch the demo
python app/demo.py- Python 3.9+
- CUDA GPU (for BERT fine-tuning, optional but recommended)
- 8GB+ RAM
# Clone
git clone https://github.com/najahaja/Sentiment-Analysis.git
cd Sentiment-Analysis
# Option 1: Conda (recommended)
conda env create -f environment.yml
conda activate sentiment-env
# Option 2: pip virtual environment
python -m venv venv
venv\Scripts\activate # Windows
source venv/bin/activate # Mac/Linux
pip install -r requirements.txtRun these notebooks in order for the full pipeline:
| # | Notebook | Description |
|---|---|---|
| 01 | 01_EDA_preprocessing.ipynb |
Load IMDB data, clean HTML/special chars, visualize class distribution, word clouds |
| 02 | 02_tfidf_logistic_regression.ipynb |
TF-IDF feature extraction, train LR, evaluate, confusion matrix |
| 03 | 03_lstm_model.ipynb |
Load GloVe, train Bi-LSTM, plot training curves, evaluate |
| 04 | 04_bert_model.ipynb |
Fine-tune bert-base-uncased, evaluate, save model |
| 05 | 05_comparison_report.ipynb |
Side-by-side metrics, error analysis, final conclusions |
The project includes a dedicated error analysis module in src/evaluate.py and notebooks/05_comparison_report.ipynb:
- False Positives: Reviews predicted as positive but actually negative
- False Negatives: Reviews predicted as negative but actually positive
- Confidence scores for misclassified samples
- Word importance via LIME for BERT predictions
- Common error patterns: Sarcasm, negation, domain-specific vocabulary
Example output:
❌ Misclassified by BERT:
Text: "This film tries SO hard to be profound that it ends up being unintentionally hilarious."
True Label: Negative | Predicted: Positive | Confidence: 0.61
Pattern: Sarcasm / Mixed Sentiment
Although IMDB is balanced (50/50), the project demonstrates techniques for imbalanced datasets:
| Technique | Applied To |
|---|---|
class_weight='balanced' |
Logistic Regression |
Weighted BCELoss |
LSTM |
Weighted CrossEntropyLoss |
BERT |
| SMOTE (oversampling demo) | TF-IDF features |
| Stratified train/val/test split | All models |
Launch the Gradio interactive demo locally:
python app/demo.pyThen open: http://localhost:7860
The demo lets you:
- Type any review text
- See predictions from all 3 models side-by-side
- View confidence scores and sentiment bars
# 1. Create account at huggingface.co/spaces
# 2. Create a new Space with Gradio SDK
# 3. Push your code
git remote add space https://huggingface.co/spaces/najahaja/Sentiment-Analysis
git push space mainKey packages (see requirements.txt for full list):
transformers>=4.35.0
torch>=2.0.0
scikit-learn>=1.3.0
pandas>=2.0.0
numpy>=1.24.0
datasets>=2.14.0
gradio>=4.0.0
matplotlib>=3.7.0
seaborn>=0.12.0
nltk>=3.8.0
imbalanced-learn>=0.11.0
lime>=0.2.0.1
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch:
git checkout -b feature/add-roberta - Commit your changes:
git commit -m 'Add RoBERTa comparison' - Push to the branch:
git push origin feature/add-roberta - Open a Pull Request
© 2025 Ahamed Najah — All Rights Reserved.
This project is protected. You may view the code for learning purposes only. Redistribution, modification, or commercial use without explicit permission is prohibited. See the LICENSE file for full details.
Ahamed Najah
sentiment-analysis nlp bert lstm transformers huggingface scikit-learn machine-learning deep-learning python pytorch imdb text-classification