Thank you for considering contributing to getML-IO! This document outlines the process for contributing to this project and helps ensure a smooth collaboration experience.
If you installed mise correctly, it should automatically set up the environment for you when entering the project folder. If not, you can do it manually.
Either load mise into your shell:
$ eval "$(mise activate)"Or start a new shell with mise:
$ mise exec -- bashThen install the dependencies including the development dependencies:
$ uv sync \
--python 3.10 \
--upgrade \
--group development \
--extra-index-url https://europe-west1-python.pkg.dev/getml-infra/wheel/simpleWe are using git-lfs to manage large test data files for integration tests. mise will automatically install the git-lfs client for you.
You need to initialize git-lfs and pull files manually after cloning the repository.
# Initializes git-lfs for your user account (once per machine)
$ git lfs install
# Downloads the large files for the current repository
$ git lfs pullThis project follows the GitFlow branching model. Here's how it works:
main: Production-ready code that has been releaseddevelop: Integration branch for features, contains code for the next release
-
Feature branches
- Branch from:
develop - Merge back into:
develop - Naming:
feature/GH-123-short-description(with GitHub issue number)
- Branch from:
-
Release branches
- Branch from:
develop - Merge back into:
developandmain - Naming:
release/vX.Y.Z(using semantic versioning)
- Branch from:
-
Hotfix branches
- Branch from:
main - Merge back into:
mainanddevelop - Naming:
hotfix/vX.Y.Z(when preparing a specific version) orhotfix/GH-123-short-description(when addressing a specific issue)
- Branch from:
-
Create a feature branch from
develop:git switch develop git pull git switch -c feature/GH-123-short-description
-
Work on your feature, committing changes with meaningful commit messages.
-
When ready, push your branch and create a pull request to
develop. -
After code review and approval, your feature will be merged into
develop. -
Releases are created from
developwhen enough features are ready:git switch develop git pull git switch -c release/vX.Y.Z
-
After testing on the release branch, it gets merged into both
mainanddevelop.
Follow these guidelines for commit messages:
- Start with the GitHub issue number in the format
GH-{number}: - Use the present tense ("Add feature" not "Added feature")
- Use the imperative mood ("Fix bug" not "Fixes bug")
- First line should be under 50 characters (including the issue reference)
- Consider using conventional commits format after the issue number:
GH-123: type(scope): message
Good commit messages:
GH-42: feature(serializer): Add support for complex DataFrame types
Implements serialization for DataFrames with nested objects.
GH-57: bugfix(input-output): Correct file path handling on Windows systems
Replaces backslashes with forward slashes to ensure consistent
path handling across platforms.
GH-25: documentation: Update installation instructions
GH-33: testing: Add unit tests for Pipeline serialization
- Ensure your code follows the project's coding standards
- Update documentation as needed
- Include tests for new functionality
- The Pull Request should target the appropriate branch (usually
develop) - Request review from project maintainers
- Address review comments promptly
- Follow PEP 8 style guidelines
- Use type hints for function arguments and return values
- Documentation should use Google docstring format
- Run linting tools before committing:
ruff check . basedpyright
-
Write unit tests for all new functionality
-
Ensure all tests pass before submitting a pull request:
pytest
-
To run the full test suite against all supported Python versions, use the following command:
mise run test-all
-
To run tests against a single, specific Python version:
mise run test 3.12 -
Aim for high test coverage for new code
To speed up execution, our integration tests use a caching mechanism. Fitted getML pipelines and dataframes are saved as bundles in a local cache directory on your machine. This significantly reduces the time it takes to run the test suite on subsequent runs.
The cache is generally reliable, but it can sometimes become stale or corrupted, leading to unexpected test failures. If you encounter persistent or strange errors when running the integration tests, clearing this cache is a good first troubleshooting step.
The cache is located in the standard user cache directory for your operating system (e.g., ~/.cache/getml-io-test on Linux). You can clear it by deleting this directory.
When reporting issues, please include:
- A clear and descriptive title
- Steps to reproduce the issue
- Expected and actual behavior
- Your environment (OS, Python version, etc.)
- Any relevant logs or screenshots
By contributing to this project, you agree that your contributions will be licensed under the project's license.