- Title: Sentiment Analysis with RNN on IMDB Dataset
- Students:
- Abhijeet (055002)
- Jhalki Kulshrestha (055017)
- Group Number: 19
This project focuses on building a deep learning pipeline to classify movie reviews as positive, negative, or neutral using the IMDB dataset. It leverages a Recurrent Neural Network (RNN) architecture with LSTM layers to effectively capture sentiment in sequential text data.
The dataset used is the IMDB movie review dataset provided by Keras:
- Total Reviews: 50,000
- Training Set: 25,000 reviews
- Testing Set: 25,000 reviews
- Labels: Binary (1 = Positive, 0 = Negative)
- Vocabulary Size: Top 10,000 most frequent words
- Data Format: Each review is encoded as a list of word indices
- Build an end-to-end NLP pipeline for sentiment classification.
- Implement an LSTM-based RNN model for handling sequential text data.
- Apply proper text preprocessing and transformation techniques.
- Evaluate model performance using metrics like accuracy.
- Create an intuitive sentiment score and emoji output for user-friendliness.
- Embedding Layer: Converts word indices into dense vector representations.
- Bidirectional LSTM Layer: Captures contextual dependencies in both forward and backward directions.
- Dense Output Layer: Uses sigmoid activation for binary classification.
- Load IMDB Dataset: Preprocessed movie reviews with word indices.
- Preprocess Data: Pad sequences to a fixed length for uniform input.
- Build and Train Model: Implement Bidirectional LSTM with embedding layer.
- Evaluate Performance: Test the model with validation data.
- Custom Predictions: Encode new reviews, predict sentiment, and classify into positive, negative, or neutral categories.
To run the project, install the required dependencies:
pip install tensorflow numpy pandas- Train the model: Run the Python script to train and save the model.
- Predict Sentiment: Load the saved model and run it on custom reviews.
- Review Output: Sentiment scores and emoji representation.
- Evaluation Metric: Binary accuracy
- Accuracy Achieved: ~85% (varies with training settings)
- Sentiment Categories:
- Positive (>65% score) 😄
- Neutral (35%-65% score) 😐
- Negative (<35% score) 😞
This project successfully demonstrates sentiment analysis using RNN with LSTM on IMDB reviews. The model captures text sentiment effectively and provides a user-friendly interpretation with sentiment scores and emojis.
- Fine-tuning hyperparameters for better accuracy.
- Using a pre-trained word embedding like GloVe.
- Expanding to multi-class sentiment analysis.
- Deploying as a web application using Flask or Streamlit.
- Abhijeet
- Jhalki Kulshrestha