Skip to content

JSCHO99/master_thesis

Repository files navigation

Master Thesis

This repository contains the code used for the empirical analysis conducted in the master thesis. It includes the framework for data extraction as well as the code for calculation the indicators used in the thesis. The project investigates the development dynamics of open-source software (OSS) projects, focusing on the projects Terraform and OpenTofu.

Objective

The goal of this project is to analyze how development dynamics evolve over time and how they are affected by a project fork coming from a relicensing event in the Terraform project. The analysis is based on three analytical dimensions:

  • Participation – contributor activity and engagement
  • Coordination – interaction and integration of contributions
  • Innovation – introduction and delivery of new functionality

Each dimension is operationalized using multiple indicators derived from GitHub repository data.


Data Source

The analysis is based on repository-level data from the GitHub REST API, including:

  • Commits
  • Issues
  • Pull Requests
  • Releases

The data is processed and aggregated on a monthly basis to enable time-series analysis and comparison between:

  • Pre-fork and post-fork phases of Terraform
  • Terraform and OpenTofu after the fork

Indicators

The following indicators are calculated:

Participation

  • Active Contributors
  • New Contributors
  • Number of Commits

Coordination

  • Pull Requests Opened
  • Issues Opened
  • Pull Request Review Time

Innovation

  • Feature-Related Commits
  • Releases
  • Feature-Related Issues

Usage of the data extraction framework

The following commands can be used to run the data extraction for both repos and load the data into a DuckDB database. To calculate the indicators the commands in SQL-files in the folder sql_analysis needs to be executed in this database after the execution of the data extraction framework. The project uses uv as a fast Python package and environment manager to install dependencies and run commands efficiently.

Make sure to put a GitLab Access Token into the environment variable GITHUB_TOKEN before running the following commands. Otherwise, you will run into rate limit errors during the data extraction.

Run pre-commit hooks

uv run pre-commit run --all-files

Run full data extraction

uv run python -m src.master_thesis.run_all

Example to run an extraction on a single artifact (issues for example)

uv run python -m src.master_thesis.github.api.extract_issues

Check remaining rate limit on GitHub API

uv run python -m src.master_thesis.check_rate_limit

Get fork date from Opentofu

uv run python -m src.master_thesis.get_fork_date

Get repo creation dates

uv run python -m src.master_thesis.get_repo_dates

Results

The folder data/indicators_results contains the results of the analysis in csv-format which are presented and explained in the thesis.


Author

Jens Scholten

About

This repository holds the code used for the empirical analysis in the master thesis <add_title>

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages