Skip to content

byteephantom/Youtube-trending-analytics-with-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

YouTube Trending Videos Analytics (500 mb Dataset) – End-to-End Data Analysis + ML

This project analyzes of YouTube Trending data to uncover what makes videos viral using advanced EDA, NLP, visualization, SEO insights, and ML models.


📌 Project Highlights

  • Cleaned & analyzed a massive 500 mb of Indian dataset
  • Removed duplicates, converted Hindi → Hinglish titles
  • Extracted keywords, bigrams, trigrams, clickbait patterns
  • Built feature engineering: title_length, tags_count, publish_hour
  • Category-wise & channel-wise insights
  • Viral vs non-viral video comparison
  • Built ML model to predict video views (R² ≈ 0.30)
  • Extracted feature importances to understand what drives virality
  • Created multiple visualizations for storytelling

📊 Key Questions Answered

  • What type of videos become viral?
  • Which categories dominate trending?
  • Who are the most consistent creators?
  • What upload time gives maximum views?
  • What keywords/titles perform best?
  • What factors (title, tags, time) influence video views?

🔧 Tech Stack

Python, Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, WordCloud, NLP, tqdm


🧠 Machine Learning Model

  • RandomForestRegressor
  • Predicts video views using metadata
  • Performance:
    • Test R²: ~0.30
    • Strong predictors:
      • Description length
      • Title length
      • Tags count
      • Upload hour
      • Category
      • Tags count

📈 Visualizations

  • Category performance
  • Channel ranking
  • Keyword clouds
  • Title pattern analysis
  • Upload-hour heatmaps
  • Trending duration visualization
  • Feature importance chart

About

A comprehensive data analytics project exploring YouTube trending videos using a large multi-country dataset. Includes data cleaning, feature engineering, NLP on titles, category and channel insights, keyword patterns, upload-time analysis, visual storytelling, and ML models to identify factors that drive video virality.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors