Skip to content

ts-7bd/EuropeanSoccer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Soccer matches in Europe

Analysis of the European Soccer Database with over 28000 Matches in 11 Leagues between season 2008/2009 and 2015/2016.

In this repository I want to do several projects. Theese are one data engineering and two data analysis project.
First, build up of a Data Warehouse on AWS server. Secondary, analysis of game results, home-team advantage, and performance of Hamburger SV.

Getting database from Kaggle

The database is available on Kaggle. Here is a short overwiev of commands, which are needed for searching and downloading.

  • Seaching for European Soccer Database based on title
    kaggle datasets list -s 'European Soccer Database'
  • Searching for Databases based on reference
    kaggle datasets files hugomathien/soccer
  • Download European Soccer Database from reference hugomathien/soccer
    kaggle datasets download hugomathien/soccer -f database.sqlite -p 'Z:/IT-Projekte/FIFA soccer analysis/'

Initialization of a data warehouse

The chosen DWH is the data vault. It is built with two python scripts on my AWS account.
Data are extracted from the sql-database and loaded into the data vault.

fifa_dwh_lib.py

  • About: module for fifa_data_vault.py. It contains subroutines for creating hubs, satelites, and links, inserting into existing hubs and creating hashkeys.
  • Code: Python 3.6
  • Dependencies: Postgres SQL and connection to AWS server
  • Packages: numpy, pandas, datetime, string, sqlalchemy, psycopg2, hashlib, sqlite3
  • Notes: Refactor option - the query strings can be replaced by objekt based queries

fifa_data_vault.py

  • About: This program extracts european soccer data from the database database.sqlite. The aim is to set up the DWH of kind Data Vault on AWS.
  • Code: Python 3.6
  • Dependencies: Postgres SQL and connection to AWS server
  • Data: database.sqlite
  • Packages: numpy, pandas, datetime, string, sqlalchemy, psycopg2, hashlib, sqlite3, fifa_dwh_lib
  • Notes: Should be refactored to more functional and object orientated style with regards to clean colding principles

Analysis of game results

Analysis of 25979 soccer games in 11 European leagues between the seasons 2008/2009 and 2015/2016. Four tables are nedded from the European soccer database. These are leque, country, match, and team. The analysis consists of three major parts.

  • imvestigation of game results: most likely results, relation between home--team victory and away-team victory, number of goals
  • ivestigation of home-team advantage for each legue and season including statistical evaluation with t-tesst
  • performance of Hamburger SV in comparison to FC Bayern Munich: points and goals

soccer_game_results.ipynb

  • About: Analysis of soccer game results, home-team advantage, and performance of Hamburger SV in comparison to FC Bayern Munich
  • Code: Jupyter Notebook 6.1
  • Data: database.sqlite.zip
  • Packages: os, sys, zipfile, numpy, pandas, datetime, time, scipy, statsmodels, sqlite3, string, seaborn, matplotlib

About

Analysis of Soccer Games in 11 European leagues

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors