๐ Overview
This project explores trends in video game genres using data from 30,000+ video games (2016-2024) sourced from the Steam API and Kaggle. We analyze revenue, review scores, ownership trends, and genre evolution to provide actionable insights for developers, marketers, and industry stakeholders.
๐ฏ Objectives
- Identify popular and emerging game genres.
- Analyze review scores, revenue, and ownership trends.
- Utilize TF-IDF and Cosine Similarity for content-based recommendations.
- Offer data-driven insights for the gaming industry.
๐ Dataset
- Source: Steam API, Kaggle
- Games Analyzed: 30,000+
- Attributes: Revenue, review scores, ownership data, genres, tags, release dates
- Processing: Data cleaning, standardization, genre classification
๐ Dataset Overview
- Total Number of Games: ๐น๏ธ 65,112
- Total Number of Distinct Games: ๐น๏ธ 38,471
1๏ธโฃ Genre Distribution (Genre):
- Battle Royale: 1,030 games
- Multiplayer: 318 games
- Role-Playing Games (RPG): 15,758 games
- Racing: 1,754 games
- Strategy: 10,281 games
- Sports: 1,606 games
2๏ธโฃ Game Distribution (Pricing Model):
- Free to Play: 605 games
- Paid: 38,399 games
๐ Dataset Columns & Description
1๏ธโฃ Game Identification
- App ID ๐ท๏ธ โ Unique identifier assigned to each game in the Steam database
- Title ๐ฎ โ Name of the game
2๏ธโฃ Reviews & Ratings
- Reviews Total ๐ โ Total number of reviews submitted by users
- Reviews Score Fancy โญ โ Steam's formatted rating based on user reviews
- Reviews D7 ๐ โ Number of reviews received in the last 7 days
- Reviews D30 ๐ โ Number of reviews received in the last 30 days
- Reviews D90 ๐ โ Number of reviews received in the last 90 days
3๏ธโฃ Game Release & Pricing
- Release Date ๐๏ธ โ Date when the game was launched on Steam
- Launch Price ๐ฐ โ Initial price of the game at release
๐ Methodology
- Data Cleaning & Transformation: Handled missing values, standardized titles, converted data types.
- Ownership Estimation: Applied the Boxleiter method for estimating game ownership.
- Trend Analysis: Visualized genre trends over time.
- Genre Similarity Search: Implemented TF-IDF + Cosine Similarity to recommend similar games.
๐ Key Insights
โ RPG & Multiplayer games remain dominant in ownership and revenue.
โ Paid games have higher review scores than Free-to-Play.
โ Strategy & Sports games show steady, high ratings over time.
โ Battle Royale & Multiplayer genres are growing rapidly.
โ Revenue is highly influenced by top-performing titles.
๐ Technologies Used
- Python, PySpark, AWS, Spark-SQL, Docker
- Jupyter Notebook, Pandas, Matplotlib, Seaborn
- TF-IDF, Cosine Similarity for content-based recommendations
๐ Future Enhancements
- Cross-platform analysis (PC, console, mobile).
- Impact of emerging technologies (AR/VR, cloud gaming).
- Sentiment analysis of user reviews.
- Influencer impact tracking on game success.