You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make sure you are in the root directory, with the docker-compose.yaml file
Create an ingestion/settings.yaml file with the following values (see ingestion/settings.yaml.example)
# You need this to access the Steam Web API, which is used to fetch basic match data. You can safely use your main account to obtain the API key. You can request an API key here: https://steamcommunity.com/dev/apikeyapi_key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX# Steam Web API endpoint. You should not modify this unless you know what you are doingapi_endpoint: http://api.steampowered.com/IDOTA2Match_570/GetMatchHistoryBySequenceNum/V001/?key={}&start_at_match_seq_num={}# Kafka topic the producer will send the data to. The Kafka streams consumer expects this topictopic: dota_raw# Interval between each data fetch by the python scriptinterval: 10# 3 possible settings can be placed here:# - The sequential match id of the first match you want to fetch, as a string# - 'cassandra', will fetch the last sequential match id in the cassandra database# - 'steam', will fetch the most recent sequential match id from the "history_endpoint"match_seq_num: 4976549000 | 'steam' | 'cassandra'# Steam API Web endpoint used when 'steam' value is placed in "match_seq_num"history_endpoint: https://api.steampowered.com/IDOTA2Match_570/GetMatchHistory/V001/key={}&matches_requested=1
All the values present in the settings file can be overwritten by any environment variable whit the same name in all caps
To run the Elasticsearch container you may need to tweak the vm.max_map_count variable. See here
Make sure you are in the root directory, with the all-in-one-deploy.yaml file
Make sure to edit the kubernetes/kafkaproducer-key.yaml file to add your Steam Web API key. All the settings shown above will be determined by the environment variable whit the same name in all caps
Start:
kubectl apply -f all-in-one-deploy.yaml
Stop:
kubectl delete -f all-in-one-deploy.yaml
Useful commands
docker exec -it <container-name> bash Get a terminal into the running container
docker system prune Cleans your system of any stopped containers, images, and volumes
docker-compose build Rebuilds your containers (e.g. for database schema updates)
kubectl -n default rollout restart deploy Restart all Kubernetes pods
TODO list
Add the much needed replay parsing to gather much more informations about each match.
Make a usable user interface to fetch the data.
Use cluster with more than one node for each of the distributed services.
Improve performances.
Use kubernetes at its fullest.
Use the recommanded security layers like passwords and cryptography.
This is a project created for the subject TAP at the university of Catania. The idea is to showcase a simple ETL pipeline using some of the most widely known technologies in the big data fields