Contributing to getML-IO

Thank you for considering contributing to getML-IO! This document outlines the process for contributing to this project and helps ensure a smooth collaboration experience.

Development Setup

If you installed mise correctly, it should automatically set up the environment for you when entering the project folder. If not, you can do it manually.

Either load mise into your shell:

$ eval "$(mise activate)"

Or start a new shell with mise:

$ mise exec -- bash

Then install the dependencies including the development dependencies:

$ uv sync \
    --python 3.10 \
    --upgrade \
    --group development \
    --extra-index-url https://europe-west1-python.pkg.dev/getml-infra/wheel/simple

We are using git-lfs to manage large test data files for integration tests. mise will automatically install the git-lfs client for you.

You need to initialize git-lfs and pull files manually after cloning the repository.

# Initializes git-lfs for your user account (once per machine)
$ git lfs install

# Downloads the large files for the current repository
$ git lfs pull

GitFlow Workflow

This project follows the GitFlow branching model. Here's how it works:

Main Branches

main: Production-ready code that has been released
develop: Integration branch for features, contains code for the next release

Supporting Branches

Feature branches
- Branch from: develop
- Merge back into: develop
- Naming: feature/GH-123-short-description (with GitHub issue number)
Release branches
- Branch from: develop
- Merge back into: develop and main
- Naming: release/vX.Y.Z (using semantic versioning)
Hotfix branches
- Branch from: main
- Merge back into: main and develop
- Naming: hotfix/vX.Y.Z (when preparing a specific version) or hotfix/GH-123-short-description (when addressing a specific issue)

Workflow Steps

Create a feature branch from develop:

git switch develop
git pull
git switch -c feature/GH-123-short-description

Work on your feature, committing changes with meaningful commit messages.
When ready, push your branch and create a pull request to develop.
After code review and approval, your feature will be merged into develop.
Releases are created from develop when enough features are ready:
```
git switch develop
git pull
git switch -c release/vX.Y.Z
```
After testing on the release branch, it gets merged into both main and develop.

Commit Message Guidelines

Follow these guidelines for commit messages:

Start with the GitHub issue number in the format GH-{number}:
Use the present tense ("Add feature" not "Added feature")
Use the imperative mood ("Fix bug" not "Fixes bug")
First line should be under 50 characters (including the issue reference)
Consider using conventional commits format after the issue number: GH-123: type(scope): message

Examples

Good commit messages:

GH-42: feature(serializer): Add support for complex DataFrame types

Implements serialization for DataFrames with nested objects.

GH-57: bugfix(input-output): Correct file path handling on Windows systems

Replaces backslashes with forward slashes to ensure consistent
path handling across platforms.

GH-25: documentation: Update installation instructions

GH-33: testing: Add unit tests for Pipeline serialization

Pull Request Process

Ensure your code follows the project's coding standards
Update documentation as needed
Include tests for new functionality
The Pull Request should target the appropriate branch (usually develop)
Request review from project maintainers
Address review comments promptly

Code Style

Follow PEP 8 style guidelines
Use type hints for function arguments and return values
Documentation should use Google docstring format
Run linting tools before committing:
```
ruff check .
basedpyright
```

Testing

Write unit tests for all new functionality
Ensure all tests pass before submitting a pull request:
```
pytest
```
To run the full test suite against all supported Python versions, use the following command:
```
mise run test-all
```
To run tests against a single, specific Python version:
```
mise run test 3.12
```
Aim for high test coverage for new code

Integration Test Caching

To speed up execution, our integration tests use a caching mechanism. Fitted getML pipelines and dataframes are saved as bundles in a local cache directory on your machine. This significantly reduces the time it takes to run the test suite on subsequent runs.

The cache is generally reliable, but it can sometimes become stale or corrupted, leading to unexpected test failures. If you encounter persistent or strange errors when running the integration tests, clearing this cache is a good first troubleshooting step.

The cache is located in the standard user cache directory for your operating system (e.g., ~/.cache/getml-io-test on Linux). You can clear it by deleting this directory.

Issue Reporting

When reporting issues, please include:

A clear and descriptive title
Steps to reproduce the issue
Expected and actual behavior
Your environment (OS, Python version, etc.)
Any relevant logs or screenshots

License

By contributing to this project, you agree that your contributions will be licensed under the project's license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to getML-IO

Development Setup

GitFlow Workflow

Main Branches

Supporting Branches

Workflow Steps

Commit Message Guidelines

Examples

Pull Request Process

Code Style

Testing

Integration Test Caching

Issue Reporting

License

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to getML-IO

Development Setup

GitFlow Workflow

Main Branches

Supporting Branches

Workflow Steps

Commit Message Guidelines

Examples

Pull Request Process

Code Style

Testing

Integration Test Caching

Issue Reporting

License