Thank you for your interest in contributing to vowl! We welcome contributions from the community and are grateful for any help you can provide.
- Branching & Workflow Strategy
- Code of Conduct
- Getting Started
- Development Setup
- How to Contribute
- Coding Standards
- Testing
- Release Workflow
- Submitting Changes
- Reporting Issues
To maintain quality and consistency, this repository follows an issue-first, fork-and-PR workflow.
- Open an issue first: describe the bug, feature, or improvement before writing code. This allows maintainers to triage, discuss scope, and avoid duplicate work.
- Fork & branch: fork the repository, then create a feature branch (e.g.
feature/your-feature-nameorfix/issue-description) frommain. - Submit a Pull Request: open a PR against
mainthat references the issue (e.g.Closes #42). CI must pass and at least one maintainer must approve before merge. - Keep your fork in sync: pull from
upstream mainregularly to stay current with the latest changes and security patches.
By participating in this project, you agree to maintain a respectful and inclusive environment. Please:
- Be respectful and considerate in all interactions
- Welcome newcomers and help them get started
- Focus on constructive feedback
- Accept responsibility for your mistakes and learn from them
Before contributing, please:
- Fork the repository and clone your fork
- Set up the development environment using the instructions in this document
- Create a new branch for your changes
The Makefile is the canonical source for local development commands. If a README example and a Make target ever diverge, follow the Make target.
- Install
uv - Install Python 3.10 or newer
- Fork and clone this repository
git clone https://github.com/<your-username>/vowl.git
cd vowl
git remote add upstream https://github.com/govtech-data-practice/vowl.gitFor standard contributor setup:
make install-devThis uses the Makefile target that runs:
uv sync --group devIf you need all optional dependencies as well:
make install-allRun tests:
make testFormat code:
make formatRun lint checks:
make lintRun type checking:
make typecheckRun all code quality checks (format + lint + typecheck):
make checkRun security scan:
make security-scanRun dependency vulnerability audit:
make security-auditRun all checks and tests:
make verifyClean build artifacts:
make cleanWe welcome several types of contributions:
- Bug fixes: Found a bug? Submit a fix!
- New features: Have an idea? Implement it!
- Documentation: Improve docs, add examples, fix typos
- New executors: Add support for new DataFrame types
- New integrations: Add platform-specific utilities
- Test improvements: Add test coverage or improve existing tests
- Follow PEP 8 style guidelines
- Use meaningful variable and function names
- Add docstrings to all public functions and classes
- Keep functions focused and single-purpose
Use Google-style docstrings:
def validate_data(df, contract_path: str, table_name: str = None):
"""Validates a DataFrame against a data contract.
Args:
df: The DataFrame to validate (pandas or Spark).
contract_path: Path to the YAML contract file.
table_name: Optional override for the table name in SQL queries.
Returns:
ValidationResult: An object containing validation results and methods.
Raises:
ValueError: If the DataFrame type is not supported.
"""Use type hints for function signatures:
from typing import Optional, List, Dict
def process_rules(rules: List[Dict], table_name: str) -> Optional[str]:
...Write clear, descriptive commit messages:
- Use the imperative mood ("Add feature" not "Added feature")
- Keep the first line under 72 characters
- Reference issues when applicable
Examples:
Add Polars DataFrame executor support
Fix null handling in resale_price validation
Update README with new API examples (#42)
Before submitting changes, ensure the automated test suite passes:
make testThe underlying command is:
uv run pytest tests/The GitHub Actions CI workflow uses the lean-ci-test dependency group instead of the full dev environment:
uv sync --group lean-ci-testThis is intentional. CI is meant to catch core regressions quickly, but it does not represent the full backend matrix. In particular:
- Optional Ibis backends such as MySQL, MSSQL, and Oracle are not installed in the default CI job
- Backend integration tests that require missing connectors are skipped rather than provisioned in CI
- Some integrations also need host-level tools or drivers beyond Python packages (for example Docker, database client libraries, or ODBC drivers)
If your change affects backend-specific behavior, connector-specific SQL generation, or cross-database execution paths, validate it locally with the fuller dependency set:
make install-dev
make testUse make install-all if your change also depends on optional extras outside the default development setup.
You can still run targeted scripts or tests manually when needed:
# Run the basic usage example
python examples/basic_usage.py
# Run a specific test file
uv run pytest tests/test_usage_patterns.pyWhen adding new features:
- Add test cases that cover the new functionality
- Include edge cases (null values, empty DataFrames, etc.)
- Test both pandas and Spark implementations where applicable
- Use the existing
tests/hdb_resale/HDBResaleWithErrors.csvfor testing when possible - For new test data, keep files small and representative
- Document any new test data files
This section is intended for maintainers publishing vowl to PyPI.
make release-checkThis target installs packaging tools, builds the distribution, and runs Twine validation.
The GitHub Actions workflow publishes package artifacts in two cases:
- A push to
mainafter a pull request is merged. Because the commit is untagged,setuptools-scmproduces a snapshot version such as1.2.4.dev3+gabcdef. - A tag such as
v1.2.3whose commit is reachable frommain. In that case the published package version is the clean release version1.2.3.
Publishing uses a GitHub Actions trusted publisher workflow; no manual API tokens are required.
Package versions are derived from Git tags via setuptools-scm. For a clean release version:
- Update
CHANGELOG.md: rename the[Unreleased]section to[X.Y.Z] - YYYY-MM-DD, add a fresh empty[Unreleased]section above it, and update the comparison links at the bottom of the file. - Commit the changelog:
git add CHANGELOG.md && git commit -m "Release X.Y.Z" - Tag and push:
make release-tag VERSION=1.2.3 git push origin v1.2.3
Consumers should install clean releases by pinning an exact version such as vowl==1.2.3. Snapshot builds from main remain available for internal validation, but they should be treated as pre-release artifacts rather than the default install target.
-
Sync your fork with the latest upstream changes:
git fetch upstream git rebase upstream/main
-
Create a feature branch:
git checkout -b feature/your-feature-name
-
Make your changes and commit them with clear messages
-
Push your branch to your fork:
git push origin feature/your-feature-name
-
Open a Pull Request against the
mainbranch, referencing the related issue (e.g.Closes #42)
- Title: Use a clear, descriptive title
- Description: Explain what changes you made and why, and link the related issue
- Testing: Describe how you tested your changes
- Screenshots: Include output examples if applicable
- Breaking Changes: Clearly note any breaking changes
Before submitting, ensure:
- Code follows the project's style guidelines
- Documentation is updated (if applicable)
- All CI checks pass
- Commit messages are clear and descriptive
- PR description links to the related issue
When reporting bugs, please include:
- Environment details: Python version, OS, package versions
- Steps to reproduce: Minimal code example to reproduce the issue
- Expected behavior: What you expected to happen
- Actual behavior: What actually happened
- Error messages: Full traceback if applicable
For feature requests, please describe:
- Use case: Why do you need this feature?
- Proposed solution: How do you envision it working?
- Alternatives: Any workarounds you've considered?
If you have questions about contributing:
- Check existing issues and discussions
- Open a new issue with the "question" label
- Reach out to the maintainers
Thank you for contributing to vowl! Your efforts help make data quality validation better for everyone.