This project analyzes public sentiment towards foldable iPhones using YouTube comments, and extracts user experiences, feature demands, and pain points from Reddit discussions about existing foldable phones. The goal is to provide insights that could guide product development decisions for future foldable devices.
We collected and cleaned reviews that are highly relevant to foldable phones, and used this text data to fine-tune an LLM (mini-BERT). By comparing the variance between the functional words (camera, battery, screen, processor) and emotional words (good, bad, etc.) of the phone before and after fine-tuning, we hope to see the best and worst parts of the user experience. Provide business decision support.
Foldable_Phone_Analysis/
│
├── data/ # Raw & processed datasets
│ ├── youtube/ # YouTube comments (cleaned and sentiment-labeled)
│ └── reddit/ # Reddit posts and comments (topic-focused)
│
├── models/ # Fine-tuned sentiment analysis models
│
├── outputs/ # Visualizations: confusion matrix, sentiment distribution, word clouds
│
├── scripts/ # Data preprocessing, training, inference scripts
│
├── notebooks/ # Jupyter notebooks (EDA, model evaluation)
│
├── requirements.txt # Python dependencies
└── README.md # Project description and instructions
-
Sentiment Analysis (BERTweet-based):
- Trained on IMDb dataset, applied to YouTube comments about foldable iPhones.
- Predicts positive or negative sentiment per comment.
-
User Experience Extraction (Reddit):
- Analyzes real user feedback on existing foldable phones.
- Extracts common demands (e.g., durability, battery life) and pain points (e.g., screen crease, software issues).
-
Visualizations:
- Confusion Matrix (for model evaluation)
- Sentiment Distribution (YouTube comments)
- Word Clouds (highlighting key terms from positive and negative comments)
-
Data Collection:
- Scraped YouTube comments on foldable iPhone videos.
- Collected Reddit discussions from relevant subreddits (e.g.,
r/FoldablePhones).
-
Data Preprocessing:
- Cleaned text (preserved emojis, removed URLs, mentions).
- Saved cleaned datasets (
data_preprocessing.py).
-
Model Training:
- Fine-tuned BERTweet sentiment classifier on IMDb movie reviews.
- Saved model and tokenizer (
models/saved_models/bertweet_imdb/).
-
Inference:
- Applied sentiment model to YouTube comments.
- Generated labeled dataset (
youtube_foldable_apple_comments_with_sentiment.csv).
-
Semantic Transformation Analysis:
- Fine-tune the MiniLM language model on Reddit comments.
- Compute the sentiment similarity of features before and after fine-tuning.
- Compute the incremental similarity (Δ) and rank the features based on the change in sentiment association.
- Make comparison tables and visualizations.
-
Visualization & Analysis:
- Created confusion matrix, sentiment distribution charts, and word clouds.
-
View outputs:
- Charts & visualizations in
outputs/ - Labeled YouTube comments in
data/youtube/youtube_foldable_apple_comments_with_sentiment.csv
- Sentiment Distribution
- Top positive and negative keywords
- Semantic shift ranking:
See visualizations in the outputs/ folder.
- Base model:
vinai/bertweet-base- Training dataset: IMDb movie reviews (binary sentiment)
- Fine-tuning epochs: 3
- Batch size: 32
- Semantic Transformation Model:
MiniLM-L6-H384-uncased- Fine-tuning dataset: Reddit reviews on foldable phones
- Task: Masked Language Modeling (MLM)
- Custom script for incremental similarity analysis
This project is for educational and research purposes.
- Hugging Face Transformers
- IMDb Dataset
- Reddit and YouTube for user-generated content
