Sentiment Analysis Using BERT, LSTM, GRU, and RNN

A comparative analysis of modern and traditional deep learning architectures for sentiment classification on the Sentiment140 dataset.

📋 Table of Contents

Project Overview
Dataset Details
Automatic Dataset Setup
Workflow
Models Architectures
Installation and Setup
Project Structure
Usage
Results & Comparison
License

🔍 Project Overview

This project focuses on identifying the sentiment (Positive or Negative) of Twitter messages. We evaluate multiple deep learning models to compare their performance, training time, and accuracy:

BERT (Bidirectional Encoder Representations from Transformers) - State-of-the-art transformer based approach.
LSTM (Long Short-Term Memory) - Captures long-term dependencies in text.
GRU (Gated Recurrent Unit) - A faster alternative to LSTM with similar gating mechanisms.
RNN (Recurrent Neural Network) - The foundational sequence model.

📊 Dataset Details

The Sentiment140 dataset contains 1,600,000 tweets extracted using the Twitter API.

Source: Kaggle Sentiment140
Labels:
- 0 -> Negative
- 4 -> Positive
Features: Tweet ID, Date, User, and the actual Text.

📥 Automatic Dataset Setup

Since the dataset is large (~145MB), this repository includes an automated download and organization system:

Automatic Retrieval: Uses kagglehub to download the latest version directly from Kaggle.
Local Persistence: The notebook and scripts automatically copy the downloaded CSV to a root-level Dataset/ folder for easy access and consistency.

⚙️ Workflow

The project follows a modular pipeline for data processing and model evaluation:

graph TD
    A[Data Acquisition: Sentiment140 Dataset] --> B[Data Cleaning & Mapping]
    B --> B1[Mapping Labels: 0-Neg, 4-Pos]
    B --> B2[Text Preprocessing: Regex, Stopwords]
    
    B1 --> C[Exploratory Data Analysis - EDA]
    C --> D[Data Splitting: Train/Val/Test]
    
    D --> E1[BERT Tokenization]
    D --> E2[Sequence Padding & Tokenization]
    
    E1 --> F1[Model 1: BERT Fine-Tuning]
    E2 --> F2[Model 2: RNN]
    E2 --> F3[Model 3: LSTM]
    E2 --> F4[Model 4: GRU]
    
    F1 --> G[Comparative Evaluation]
    F2 --> G
    F3 --> G
    F4 --> G
    
    G --> H[Final Analysis & Visualization]

(Workflow defined in Flow/sentiment_flow.mmd)

📂 Project Structure

Sentiment_Analysis_Using_BERT-LSTM-GRU-RNN/
├── Dataset/                   # Dataset folder (auto-created)
├── Flow/
│   └── sentiment_flow.mmd     # Mermaid workflow diagram
├── Sentiment_Analysis_Comparative_Study.ipynb  # Main research notebook
├── train_models.py           # Modular training script
├── requirements.txt           # Python dependencies
├── .gitignore                 # Files to ignore (Data, Models, etc.)
├── LICENSE                    # MIT License
└── README.md                  # Project documentation

🚀 Installation and Setup

Clone the repository:

git clone https://github.com/SANJAI-s0/Sentiment_Analysis_Using_BERT-LSTM-GRU-RNN.git
cd Sentiment_Analysis_Using_BERT-LSTM-GRU-RNN

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Run the Analysis:
- Open the .ipynb file in Jupyter or VS Code.
- Or run the script: python train_models.py

🧠 Models Architectures

1. BERT

Framework: HuggingFace Transformers (PyTorch).
Pre-trained: bert-base-uncased.
Fine-tuned with a linear classifier layer.

2. Recurrent Architectures (RNN, LSTM, GRU)

Framework: TensorFlow/Keras.
Hidden Layers: Bidirectional configurations.
Dropout for regularization to avoid overfitting.

📈 Results & Comparison

Model	Accuracy	F1-Score	Training Time
BERT	~91%	TBD	High
LSTM	~84%	TBD	Medium
GRU	~83%	TBD	Medium
RNN	~76%	TBD	Low

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📧 Contact

Sanjai - GitHub Profile

Project Link: https://github.com/SANJAI-s0/Sentiment_Analysis_Using_BERT-LSTM-GRU-RNN

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis Using BERT, LSTM, GRU, and RNN

📋 Table of Contents

🔍 Project Overview

📊 Dataset Details

📥 Automatic Dataset Setup

⚙️ Workflow

📂 Project Structure

🚀 Installation and Setup

🧠 Models Architectures

1. BERT

2. Recurrent Architectures (RNN, LSTM, GRU)

📈 Results & Comparison

📄 License

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Flow		Flow
Project		Project
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Sentiment_Analysis_Comparative_Study.ipynb		Sentiment_Analysis_Comparative_Study.ipynb
requirements.txt		requirements.txt
train_models.py		train_models.py

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis Using BERT, LSTM, GRU, and RNN

📋 Table of Contents

🔍 Project Overview

📊 Dataset Details

📥 Automatic Dataset Setup

⚙️ Workflow

📂 Project Structure

🚀 Installation and Setup

🧠 Models Architectures

1. BERT

2. Recurrent Architectures (RNN, LSTM, GRU)

📈 Results & Comparison

📄 License

📧 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages