This repository contains the code used for the empirical analysis conducted in the master thesis. It includes the framework for data extraction as well as the code for calculation the indicators used in the thesis. The project investigates the development dynamics of open-source software (OSS) projects, focusing on the projects Terraform and OpenTofu.
The goal of this project is to analyze how development dynamics evolve over time and how they are affected by a project fork coming from a relicensing event in the Terraform project. The analysis is based on three analytical dimensions:
- Participation – contributor activity and engagement
- Coordination – interaction and integration of contributions
- Innovation – introduction and delivery of new functionality
Each dimension is operationalized using multiple indicators derived from GitHub repository data.
The analysis is based on repository-level data from the GitHub REST API, including:
- Commits
- Issues
- Pull Requests
- Releases
The data is processed and aggregated on a monthly basis to enable time-series analysis and comparison between:
- Pre-fork and post-fork phases of Terraform
- Terraform and OpenTofu after the fork
The following indicators are calculated:
- Active Contributors
- New Contributors
- Number of Commits
- Pull Requests Opened
- Issues Opened
- Pull Request Review Time
- Feature-Related Commits
- Releases
- Feature-Related Issues
The following commands can be used to run the data extraction for both repos and load the data into a DuckDB database.
To calculate the indicators the commands in SQL-files in the folder sql_analysis needs to be executed in this database after the execution
of the data extraction framework. The project uses uv as a fast Python package and environment manager to install dependencies and run commands efficiently.
Make sure to put a GitLab Access Token into the environment variable GITHUB_TOKEN before running the following commands.
Otherwise, you will run into rate limit errors during the data extraction.
Run pre-commit hooks
uv run pre-commit run --all-filesRun full data extraction
uv run python -m src.master_thesis.run_allExample to run an extraction on a single artifact (issues for example)
uv run python -m src.master_thesis.github.api.extract_issuesCheck remaining rate limit on GitHub API
uv run python -m src.master_thesis.check_rate_limitGet fork date from Opentofu
uv run python -m src.master_thesis.get_fork_dateGet repo creation dates
uv run python -m src.master_thesis.get_repo_datesThe folder data/indicators_results contains the results of the analysis in csv-format which are presented and explained in the thesis.
Jens Scholten