Skip to content

tothemoon10080/Foldable_Phone_Analysis

Repository files navigation

Foldable Phone Sentiment & Experience Analysis

This project analyzes public sentiment towards foldable iPhones using YouTube comments, and extracts user experiences, feature demands, and pain points from Reddit discussions about existing foldable phones. The goal is to provide insights that could guide product development decisions for future foldable devices.

We collected and cleaned reviews that are highly relevant to foldable phones, and used this text data to fine-tune an LLM (mini-BERT). By comparing the variance between the functional words (camera, battery, screen, processor) and emotional words (good, bad, etc.) of the phone before and after fine-tuning, we hope to see the best and worst parts of the user experience. Provide business decision support.


Project Structure

Foldable_Phone_Analysis/
│
├── data/                   # Raw & processed datasets
│   ├── youtube/            # YouTube comments (cleaned and sentiment-labeled)
│   └── reddit/             # Reddit posts and comments (topic-focused)
│
├── models/                 # Fine-tuned sentiment analysis models
│
├── outputs/                # Visualizations: confusion matrix, sentiment distribution, word clouds
│
├── scripts/                # Data preprocessing, training, inference scripts
│
├── notebooks/              # Jupyter notebooks (EDA, model evaluation)
│
├── requirements.txt        # Python dependencies
└── README.md               # Project description and instructions

Features

  • Sentiment Analysis (BERTweet-based):

    • Trained on IMDb dataset, applied to YouTube comments about foldable iPhones.
    • Predicts positive or negative sentiment per comment.
  • User Experience Extraction (Reddit):

    • Analyzes real user feedback on existing foldable phones.
    • Extracts common demands (e.g., durability, battery life) and pain points (e.g., screen crease, software issues).
  • Visualizations:

    • Confusion Matrix (for model evaluation)
    • Sentiment Distribution (YouTube comments)
    • Word Clouds (highlighting key terms from positive and negative comments)

Workflow

  1. Data Collection:

    • Scraped YouTube comments on foldable iPhone videos.
    • Collected Reddit discussions from relevant subreddits (e.g., r/FoldablePhones).
  2. Data Preprocessing:

    • Cleaned text (preserved emojis, removed URLs, mentions).
    • Saved cleaned datasets (data_preprocessing.py).
  3. Model Training:

    • Fine-tuned BERTweet sentiment classifier on IMDb movie reviews.
    • Saved model and tokenizer (models/saved_models/bertweet_imdb/).
  4. Inference:

    • Applied sentiment model to YouTube comments.
    • Generated labeled dataset (youtube_foldable_apple_comments_with_sentiment.csv).
  5. Semantic Transformation Analysis:

    • Fine-tune the MiniLM language model on Reddit comments.
    • Compute the sentiment similarity of features before and after fine-tuning.
    • Compute the incremental similarity (Δ) and rank the features based on the change in sentiment association.
    • Make comparison tables and visualizations.
  6. Visualization & Analysis:

    • Created confusion matrix, sentiment distribution charts, and word clouds.
  7. View outputs:

  • Charts & visualizations in outputs/
  • Labeled YouTube comments in data/youtube/youtube_foldable_apple_comments_with_sentiment.csv

Results

  • Sentiment Distribution
  • Top positive and negative keywords
  • Semantic shift ranking:
    • Identify features that have enhanced positive sentiment or reduced negative sentiment after fine-tuning.
    • Highlight user priorities and controversial features. Figure_1

See visualizations in the outputs/ folder.


Model Details

  • Base model: vinai/bertweet-base
    • Training dataset: IMDb movie reviews (binary sentiment)
    • Fine-tuning epochs: 3
    • Batch size: 32
  • Semantic Transformation Model: MiniLM-L6-H384-uncased
    • Fine-tuning dataset: Reddit reviews on foldable phones
    • Task: Masked Language Modeling (MLM)
    • Custom script for incremental similarity analysis

License

This project is for educational and research purposes.


Acknowledgments

About

An exploratory analysis method based on domain fine-tuning representation drift is suitable for discovering potential clues, but not suitable as a high-confidence quantitative conclusion on user experience.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages