Skip to content

Latest commit

 

History

History
187 lines (131 loc) · 5.44 KB

File metadata and controls

187 lines (131 loc) · 5.44 KB

Contributing to getML-IO

Thank you for considering contributing to getML-IO! This document outlines the process for contributing to this project and helps ensure a smooth collaboration experience.

Development Setup

If you installed mise correctly, it should automatically set up the environment for you when entering the project folder. If not, you can do it manually.

Either load mise into your shell:

$ eval "$(mise activate)"

Or start a new shell with mise:

$ mise exec -- bash

Then install the dependencies including the development dependencies:

$ uv sync \
    --python 3.10 \
    --upgrade \
    --group development \
    --extra-index-url https://europe-west1-python.pkg.dev/getml-infra/wheel/simple

We are using git-lfs to manage large test data files for integration tests. mise will automatically install the git-lfs client for you.

You need to initialize git-lfs and pull files manually after cloning the repository.

# Initializes git-lfs for your user account (once per machine)
$ git lfs install

# Downloads the large files for the current repository
$ git lfs pull

GitFlow Workflow

This project follows the GitFlow branching model. Here's how it works:

Main Branches

  • main: Production-ready code that has been released
  • develop: Integration branch for features, contains code for the next release

Supporting Branches

  • Feature branches

    • Branch from: develop
    • Merge back into: develop
    • Naming: feature/GH-123-short-description (with GitHub issue number)
  • Release branches

    • Branch from: develop
    • Merge back into: develop and main
    • Naming: release/vX.Y.Z (using semantic versioning)
  • Hotfix branches

    • Branch from: main
    • Merge back into: main and develop
    • Naming: hotfix/vX.Y.Z (when preparing a specific version) or hotfix/GH-123-short-description (when addressing a specific issue)

Workflow Steps

  1. Create a feature branch from develop:

    git switch develop
    git pull
    git switch -c feature/GH-123-short-description
  2. Work on your feature, committing changes with meaningful commit messages.

  3. When ready, push your branch and create a pull request to develop.

  4. After code review and approval, your feature will be merged into develop.

  5. Releases are created from develop when enough features are ready:

    git switch develop
    git pull
    git switch -c release/vX.Y.Z
  6. After testing on the release branch, it gets merged into both main and develop.

Commit Message Guidelines

Follow these guidelines for commit messages:

  • Start with the GitHub issue number in the format GH-{number}:
  • Use the present tense ("Add feature" not "Added feature")
  • Use the imperative mood ("Fix bug" not "Fixes bug")
  • First line should be under 50 characters (including the issue reference)
  • Consider using conventional commits format after the issue number: GH-123: type(scope): message

Examples

Good commit messages:

GH-42: feature(serializer): Add support for complex DataFrame types

Implements serialization for DataFrames with nested objects.
GH-57: bugfix(input-output): Correct file path handling on Windows systems

Replaces backslashes with forward slashes to ensure consistent
path handling across platforms.
GH-25: documentation: Update installation instructions
GH-33: testing: Add unit tests for Pipeline serialization

Pull Request Process

  1. Ensure your code follows the project's coding standards
  2. Update documentation as needed
  3. Include tests for new functionality
  4. The Pull Request should target the appropriate branch (usually develop)
  5. Request review from project maintainers
  6. Address review comments promptly

Code Style

  • Follow PEP 8 style guidelines
  • Use type hints for function arguments and return values
  • Documentation should use Google docstring format
  • Run linting tools before committing:
    ruff check .
    basedpyright

Testing

  • Write unit tests for all new functionality

  • Ensure all tests pass before submitting a pull request:

    pytest
  • To run the full test suite against all supported Python versions, use the following command:

    mise run test-all
  • To run tests against a single, specific Python version:

    mise run test 3.12
  • Aim for high test coverage for new code

Integration Test Caching

To speed up execution, our integration tests use a caching mechanism. Fitted getML pipelines and dataframes are saved as bundles in a local cache directory on your machine. This significantly reduces the time it takes to run the test suite on subsequent runs.

The cache is generally reliable, but it can sometimes become stale or corrupted, leading to unexpected test failures. If you encounter persistent or strange errors when running the integration tests, clearing this cache is a good first troubleshooting step.

The cache is located in the standard user cache directory for your operating system (e.g., ~/.cache/getml-io-test on Linux). You can clear it by deleting this directory.

Issue Reporting

When reporting issues, please include:

  • A clear and descriptive title
  • Steps to reproduce the issue
  • Expected and actual behavior
  • Your environment (OS, Python version, etc.)
  • Any relevant logs or screenshots

License

By contributing to this project, you agree that your contributions will be licensed under the project's license.