Skip to content

Daniel-ASG/PopForecast

Repository files navigation

🎛️ PopForecast: AI-Powered A&R Simulator

License: MIT Python 3.8 GitHub code size in bytes GitHub last commit

A Machine Learning end-to-end platform that leverages a Mixture of Experts (MoE) architecture and Explainable AI (XAI) to forecast the organic market potential of music tracks, acting as a digital compass for A&R (Artists and Repertoire) executives.

👉 Try the Live App on Streamlit 👈

Index


1. The Business Problem: Why This Exists

In the modern music industry, Major Labels receive thousands of tracks daily. Historically, A&R executives and music producers have relied heavily on "gut feeling" to predict whether a song's acoustic profile fits the current market demand.

However, predicting music popularity is a highly non-linear problem. The "rules of success" change drastically depending on an artist's current market traction. The acoustic traits of an underground hit are entirely different from those of a mainstream pop anthem.

PopForecast was built to solve this by providing a statistically grounded, bias-free baseline.

1.1. The Value Proposition & Disclaimer

This tool is not a crystal ball to predict viral TikTok trends or marketing spikes. Instead, it is an acoustic sandbox and risk-mitigation tool:

  1. Organic Baseline: It calculates the intrinsic popularity "floor" of a track based purely on its acoustic DNA and the artist's historical fanbase.
  2. What-If Studio Simulations: Producers can tweak parameters (e.g., Tempo, Energy, Acousticness) in real-time to see how the market forecast shifts before finalizing a mix.
  3. Explainable Diagnostics: When a track is flagged with low potential, the embedded XAI dashboard explains exactly why, allowing executives to make data-driven decisions.

⚠️ Disclaimer on Discrepancies: The "Organic Floor" vs. The "Ceiling" When the model's prediction diverges from the actual Spotify popularity, it highlights human and market factors:

  • Under-predictions: Reveal the delta generated by unquantifiable cultural phenomena—iconic music videos, million-dollar marketing, or Hollywood syncs. The model confirms the track has a solid baseline; marketing does the rest.
  • Over-predictions: Highlight hidden opportunities. The model flags niche/underground tracks that possess the acoustic engineering of a mainstream hit but lack the marketing budget to achieve their true potential.

2. Key Features & Showcase

PopForecast is designed to bridge the gap between complex data science and intuitive music production.

2.1. Live Dashboard & Simulation

The application provides an interactive playground for A&Rs:

  • Real-time Ingestion: Fetches metadata directly from ReccoBeats and Last.fm APIs.
  • What-If Controls: Interactive sliders for acoustic features allowing producers to simulate mix adjustments before release.
  • The "Anti-Sabotage" Toggle: A unique data-cleaning feature that dynamically filters noisy crowd-sourced tags to reveal the track's true pop potential.

Don't just take my word for it. Test how the "Authoritarian Wall" affects underground vs mainstream artists in real-time: 🚀 Run your own simulation here

2.2. Dual-Layer Explainability (XAI)

Transparency is at the core of PopForecast. Every prediction is accompanied by:

  • Local Active Drivers: Visualization of which specific features are pushing the current track's score up or down.
  • Global Rules: A high-level view of the active Expert's decision-making weights.

2.3. Audit Receipts

  • JSON Report Exporter: One-click download of the entire simulation state, including raw API data, simulated features, and model interpretability metrics for internal label review.

3. Machine Learning Architecture (MoE)

PopForecast moves away from a "one-size-fits-all" model approach. Since the determinants of success for an indie artist are mathematically different from those of a global superstar, we implemented a Mixture of Experts (MoE) architecture.

  • The Gating Network: Routes tracks to specialized XGBoost models based on the artist's Cultural Authority (Listener volume).
  • Strict Temporal Split: All models were trained using a temporal cutoff (Training < 2021) to ensure the system never "predicts the past" with future data, preventing Data Leakage.
  • Topological Context: Utilizes K-Means Clustering to grant the experts a latent "Acoustic Neighborhood" context, normalizing features across distinct genres.

4. Technical Stack & Data Pipeline

  • Language: Python 3.10.12
  • Environment: Poetry (Strict dependency versioning)
  • Core Engine: XGBoost (High-capacity ensemble models)
  • Interpretability: SHAP (Kernel & Tree explainers)
  • Interface: Streamlit (Reactive web framework)

5. Project Structure

PopForecast/
├── .streamlit/           # Streamlit configuration and secrets
├── data/                 # Local data storage (ignored by git)
├── models/               # Trained MoE Experts and metadata
├── notebooks/            # Research, EDA, and Model Auditing
├── src/                  # Source code
│   ├── api/              # API Clients (Spotify, Last.fm)
│   ├── data/             # Raw data preprocessing logic
│   ├── features/         # Feature engineering and data contracts
│   ├── models/           # Model training and evaluation routines
│   └── ui/               # Streamlit Web Application
├── pyproject.toml        # Poetry dependency management
└── IMPLEMENTATION_LOG.md # Technical decision history


6. Interactive Case Studies (Try It Yourself)

To truly understand the power of PopForecast's ability to separate acoustic DNA from marketing and metadata noise, try running these specific searches in the Live App:

  • The Catalog Paradox (Lady Gaga[2009] The Fame Monster): Look at the dropdown for Bad Romance. You will see dozens of duplicate IDs with 0 or 10 popularity. The App's UI instantly reveals the true "Master Track" [Pop: 81]. Notice how the AI predicts a solid baseline (The Floor) without looking at the iconic music video.
  • The "Hollywood Sync" Effect (Kate Bush[1985] Hounds Of Love): Select Running Up That Hill. The model predicts a high score for a 40-year-old track, recognizing its immortal synth-pop DNA. The actual 78 popularity represents the unquantifiable Stranger Things TV sync effect.
  • The Vehicle/Genre Friction (ANGRA[1993] Angels Cry): Select Wuthering Heights. Even though it's a cover of Kate Bush with similar tempo and energy, the MoE Router detects the Tag Metal and routes it to the Tipping Point expert, predicting a lower ceiling due to the algorithmic friction of the Heavy Metal genre in global pop playlists.

7. Next Steps

To further enhance the models and the overall platform, I am considering the following steps:

  • Feature Engineering (Lyrics & Social Media): Integrate NLP models to quantify the emotional valency of lyrics, and include external features like TikTok trending hashtag frequency to predict the "Marketing Delta" (the difference between the Organic Floor and the Viral Ceiling).
  • Advanced Audio Processing: Experiment with deep learning techniques (e.g., CNNs on Mel-spectrograms) to extract features directly from raw audio files, reducing reliance on external API acoustic attributes.
  • Time-Series Decay Modeling: Enhance the prediction logic to better map the natural popularity decay curve of catalog tracks over decades.
  • Deployment and Containerization: Containerize the application using Docker and deploy it to a cloud environment (e.g., AWS, GCP) to ensure high availability and continuous monitoring for concurrent A&R users.

8. How to Use this Project

This section guides you through running the PopForecast Live A&R Simulator on your local machine.

Prerequisites:

  • This project was developed using Python version 3.10.12.
  • You need to have Poetry installed for strict dependency management.

Core Libraries:

  • streamlit
  • xgboost
  • shap
  • requests
  • pandas

Instructions:

Don't want to install dependencies? Play with the live version on Streamlit Cloud instead!

If you prefer to run the environment locally, follow these steps:

  1. Clone this repository to your local machine:
git clone [https://github.com/Daniel-ASG/PopForecast.git](https://github.com/Daniel-ASG/PopForecast.git)
cd PopForecast
  1. Install the required dependencies via Poetry:
poetry install
  1. Set up your API credentials. Create a .streamlit folder and a secrets.toml file:
mkdir .streamlit
touch .streamlit/secrets.toml

Inside the secrets.toml file, add your Last.fm API Key:

LASTFM_API_KEY = "your_lastfm_key_here"
  1. Run the Streamlit application:
poetry run streamlit run src/ui/app.py

9. Author

Made by Daniel Gomes. Feel free to reach out to discuss music data, machine learning architecture, or A&R analytics!

Website Email LinkedIn

About

Spotify track popularity forecasting with a reproducible ML pipeline.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors