A complete, end-to-end football analytics pipeline designed to analyze player and match performance for Brentford FC and other Championship clubs during the 2024–2025 season — using a modern data stack and cloud architecture.
- 📌 Project Overview
- 🚀 Tech Stack
- 🗂️ Folder Structure
- 🔄 Architecture Overview
- 📈 Dashboards
- 🧰 How to Run the Project
This project scrapes detailed football statistics from FBref, stores them in a PostgreSQL database, and then transforms them into meaningful insights through a modern ELT pipeline using:
- Selenium to scrape data
- PostgreSQL as a landing database
- dlt for ingestion into Snowflake (Bronze layer)
- dbt for transformations (Bronze → Silver)
- Apache Airflow for orchestration
- MS Power BI for dashboards
Football clubs, especially in competitive leagues in England, face increasing pressure to make data-driven decisions regarding recruitment, performance analysis, and match strategy. However, many clubs lack accessible and centralized platforms that collect, transform, and visualize comprehensive player and team data consistently and reliably. This often leads to:
- Fragmented data across multiple sources.
- Manual reporting and analysis.
- Missed insights in scouting and performance tracking.
- Inefficiencies in the data pipeline and lack of automation.
The Brentford Data Analytics project solves these challenges by building a complete end-to-end modern data stack using open-source tools and cloud infrastructure.
With this architecture, Brentford FC (or any other club) can:
- Compare players across the league using consistent metrics.
- Evaluate team and player performance over time.
- Automate workflows and focus on tactical and strategic decision-making.
| Layer | Tool/Technology |
|---|---|
| Scraping | Python, Selenium |
| Infrastructure | Docker |
| Landing DB | PostgreSQL |
| Ingestion | dlt |
| Data Warehouse | Snowflake |
| Modeling | dbt |
| Orchestration | Apache Airflow |
| BI Tool | MS Power BI |
Brentford-Data-Analytics/
│
├── scripts/ # Scraping & data loading
│ └── functions/
│ ├── __init__.py
│ ├── clean_data.py
│ └── scrape_data.py
│ └── scraper.py
│
├── etl_pipeline/ # dlt pipeline to Snowflake
│ ├── .dlt/
│ ├── etl_pipeline.py
│
├── airflow/ # Airflow DAGs and config
│ ├── dags/
│ │ └── full_pipeline_dag.py
│ └── .env # DAG-specific env variables
│
├── snowflake_dbt/ # dbt models
│ ├── models/
│ │ ├── dim/
│ │ ├── fact/
│ └── dbt_project.yml
│
├── BI/ # Power BI files
│ └── brentford_dashboard.pbix
|
├── images/ # Screenshots files
│
├── docker-compose.yml # Infrastructure setup
├── .gitignore
├── README.md
├── data_dictionary.md
└── architecture_diagram.png
Dashboards are created using MS Power BI and connected to the Silver layer in Snowflake.
Key Insights Include:
- 🧍 Player Performance Overview
- 🧤 Goalkeeper Metrics
🔗 View Live Dashboard on NovyPro
git clone https://github.com/your-username/Brentford-Data-Analytics.git
cd Brentford-Data-Analyticscp .env.example .env
# Fill in PostgreSQL, Snowflake, and other variablesdocker-compose up -dGo to: http://localhost:8080
Default credentials:
Username: airflow
Password: airflow
From the Airflow UI:
- Toggle the DAG switch "on"
- Click
▶️ to trigger the DAG namedfull_pipeline_dag
Or from CLI inside the container:
docker exec -it airflow-webserver bash
airflow dags trigger full_pipeline_dagfull_pipeline_dag: Orchestrates scraping → PostgreSQL → dlt → Snowflake → dbtdlt_load_dag: Isolated dlt run (optional trigger)
- All models follow the Medallion architecture (Bronze → Silver).
- Airflow runs each part of the pipeline in sequence.
- Power BI visuals use Brentford’s brand theme & color palette.
- All tables have
loaded_atto track freshness.
🏁 For detailed flows and table info, see:
This project is currently maintained for learning and demonstration purposes. If you'd like to contribute, feel free to fork and open a pull request!




