Skip to content

Commit 4e56ae0

Browse files
committed
Merge remote-tracking branch 'origin/main' into python3.14
2 parents 5345e53 + b951973 commit 4e56ae0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+567
-498
lines changed

.github/workflows/main.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ on:
1212
pull_request:
1313
branches:
1414
- '*'
15+
merge_group:
1516

1617
jobs:
1718

@@ -27,8 +28,11 @@ jobs:
2728
enable-cache: true
2829
- name: Install just
2930
uses: extractions/setup-just@v3
31+
- name: Install graphviz
32+
run: |
33+
sudo apt-get update
34+
sudo apt-get install graphviz graphviz-dev
3035
- run: just typing
31-
- run: just typing-nb
3236

3337
run-tests:
3438

.pre-commit-config.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,12 @@ repos:
2525
- id: python-no-log-warn
2626
- id: text-unicode-replacement-char
2727
- repo: https://github.com/astral-sh/ruff-pre-commit
28-
rev: v0.14.9
28+
rev: v0.14.10
2929
hooks:
3030
- id: ruff-format
3131
- id: ruff-check
3232
- repo: https://github.com/astral-sh/uv-pre-commit
33-
rev: 0.9.17
33+
rev: 0.9.18
3434
hooks:
3535
- id: uv-lock
3636
- repo: https://github.com/executablebooks/mdformat
@@ -59,7 +59,7 @@ repos:
5959
- id: nbstripout
6060
exclude: (docs)
6161
- repo: https://github.com/crate-ci/typos
62-
rev: v1
62+
rev: typos-dict-v0.13.13
6363
hooks:
6464
- id: typos
6565
exclude: (\.ipynb)

CHANGELOG.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,16 @@ releases are available on [PyPI](https://pypi.org/project/pytask) and
77

88
## Unreleased
99

10+
- {pull}`739` closes file descriptors for the capture manager between CLI runs and
11+
disposes stale database engines to prevent hitting OS file descriptor limits in
12+
large test runs.
1013
- {pull}`725` fixes the pickle node hash test by accounting for Python 3.14's
1114
default pickle protocol.
12-
- {pull}`???` adapts the interactive debugger integration to Python 3.14's
15+
- {pull}`726` adapts the interactive debugger integration to Python 3.14's
1316
updated `pdb` behaviour and keeps pytest-style capturing intact.
17+
- {pull}`734` migrates from mypy to ty for type checking.
18+
- {pull}`736` updates the comparison to other tools documentation and adds a section on
19+
the Common Workflow Language (CWL) and WorkflowHub.
1420

1521
## 0.5.7 - 2025-11-22
1622

docs/source/explanations/comparison_to_other_tools.md

Lines changed: 73 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -10,124 +10,111 @@ in other WMFs.
1010

1111
## [snakemake](https://github.com/snakemake/snakemake)
1212

13-
Pros
14-
15-
- Very mature library and probably the most adapted library in the realm of scientific
16-
workflow software.
17-
- Can scale to clusters and use Docker images.
18-
- Supports Python and R.
19-
- Automatic test case generation.
20-
21-
Cons
22-
23-
- Need to learn snakemake's syntax which is a mixture of Make and Python.
24-
- No debug mode.
25-
- Seems to have no plugin system.
13+
Snakemake is one of the most widely adopted workflow systems in scientific computing. It
14+
scales from local execution to clusters and cloud environments, with built-in support
15+
for containers and conda environments. Workflows are defined using a DSL that combines
16+
Make-style rules with Python, and can be exported to CWL for portability.
2617

2718
## [ploomber](https://github.com/ploomber/ploomber)
2819

29-
General
30-
31-
- Strong focus on machine learning pipelines, training, and deployment.
32-
- Integration with tools such as MLflow, Docker, AWS Batch.
33-
- Tasks can be defined in yaml, python files, Jupyter notebooks or SQL.
34-
35-
Pros
36-
37-
- Conversion from Jupyter notebooks to tasks via
38-
[soorgeon](https://github.com/ploomber/soorgeon).
39-
40-
Cons
41-
42-
- Programming in Jupyter notebooks increases the risk of coding errors (e.g.
43-
side-effects).
44-
- Supports parametrizations in form of cartesian products in `yaml` files, but not more
45-
powerful parametrizations.
20+
Ploomber focuses on machine learning pipelines with strong integration into MLflow,
21+
Docker, and AWS Batch. Tasks can be defined in YAML, Python files, Jupyter notebooks, or
22+
SQL, and it can convert notebooks into pipeline tasks.
4623

4724
## [Waf](https://waf.io)
4825

49-
Pros
50-
51-
- Mature library.
52-
- Can be extended.
53-
54-
Cons
55-
56-
- Focus on compiling binaries, not research projects.
57-
- Bus factor of 1.
26+
Waf is a mature build system primarily designed for compiling software projects. It
27+
handles complex build dependencies and can be extended with Python.
5828

5929
## [nextflow](https://github.com/nextflow-io/nextflow)
6030

61-
- Tasks are scripted using Groovy which is a superset of Java.
62-
- Supports AWS, Google, Azure.
63-
- Supports Docker, Shifter, Podman, etc.
31+
Nextflow is a workflow system popular in bioinformatics that runs on AWS, Google Cloud,
32+
and Azure. It uses Groovy (a JVM language) for scripting and has strong support for
33+
containers including Docker, Singularity, and Podman.
6434

6535
## [Kedro](https://github.com/kedro-org/kedro)
6636

67-
Pros
68-
69-
- Mature library, used by some institutions and companies. Created inside McKinsey.
70-
- Provides the full package: templates, pipelines, deployment
37+
Kedro is a mature workflow framework developed at McKinsey that provides project
38+
templates, data catalogs, and deployment tooling. It is designed for production machine
39+
learning pipelines with a focus on software engineering best practices.
7140

7241
## [pydoit](https://github.com/pydoit/doit)
7342

74-
General
75-
76-
- A general task runner which focuses on command line tools.
77-
- You can think of it as an replacement for make.
78-
- Powers Nikola, a static site generator.
43+
pydoit is a general-purpose task runner that serves as a Python replacement for Make. It
44+
focuses on executing command-line tools and powers projects like Nikola, a static site
45+
generator.
7946

8047
## [Luigi](https://github.com/spotify/luigi)
8148

82-
General
83-
84-
- A build system written by Spotify.
85-
- Designed for any kind of long-running batch processes.
86-
- Integrates with many other tools like databases, Hadoop, Spark, etc..
87-
88-
Cons
89-
90-
- Very complex interface and a lot of stuff you probably don't need.
91-
- [Development](https://github.com/spotify/luigi/graphs/contributors) seems to stall.
49+
Luigi is a workflow system built by Spotify for long-running batch processes. It
50+
integrates with Hadoop, Spark, and various databases for large-scale data pipelines.
51+
Development has slowed in recent years.
9252

9353
## [sciluigi](https://github.com/pharmbio/sciluigi)
9454

95-
sciluigi aims to be a lightweight wrapper around luigi.
96-
97-
Cons
98-
99-
- [Development](https://github.com/pharmbio/sciluigi/graphs/contributors) has basically
100-
stalled since 2018.
101-
- Not very popular compared to its lifetime.
55+
sciluigi is a lightweight wrapper around Luigi aimed at simplifying scientific workflow
56+
development. It reduces some of Luigi's boilerplate for research use cases. Development
57+
has stalled since 2018.
10258

10359
## [scipipe](https://github.com/scipipe/scipipe)
10460

105-
Cons
61+
SciPipe is a workflow library written in Go for building robust, flexible pipelines
62+
using Flow-Based Programming principles. It compiles workflows to fast binaries and is
63+
designed for bioinformatics and cheminformatics applications involving command-line
64+
tools.
10665

107-
- [Development](https://github.com/scipipe/scipipe/graphs/contributors) slowed down.
108-
- Written in Go.
66+
## [SCons](https://github.com/SCons/scons)
10967

110-
## [Scons](https://github.com/SCons/scons)
111-
112-
Pros
113-
114-
- Mature library.
115-
116-
Cons
117-
118-
- Seems to have no plugin system.
68+
SCons is a mature, cross-platform software construction tool that serves as an improved
69+
substitute for Make. It uses Python scripts for configuration and has built-in support
70+
for C, C++, Java, Fortran, and automatic dependency analysis.
11971

12072
## [pypyr](https://github.com/pypyr/pypyr)
12173

122-
General
74+
pypyr is a task-runner for automation pipelines defined in YAML. It provides built-in
75+
steps for common operations like loops, conditionals, retries, and error handling
76+
without requiring custom code, and is often used for CI/CD and DevOps automation.
77+
78+
## [ZenML](https://github.com/zenml-io/zenml)
12379

124-
- A general task-runner with task defined in yaml files.
80+
ZenML is an MLOps framework for building portable ML pipelines that can run on various
81+
orchestrators including Kubernetes, AWS SageMaker, GCP Vertex AI, Kubeflow, and Airflow.
82+
It focuses on productionizing ML workflows with features like automatic
83+
containerization, artifact tracking, and native caching.
12584

126-
## [zenml](https://github.com/zenml-io/zenml)
85+
## [Flyte](https://github.com/flyteorg/flyte)
12786

128-
## [flyte](https://github.com/flyteorg/flyte)
87+
Flyte is a Kubernetes-native workflow orchestration platform for building
88+
production-grade data and ML pipelines. It provides automatic retries, checkpointing,
89+
failure recovery, and scales dynamically across cloud providers including AWS, GCP, and
90+
Azure.
12991

13092
## [pipefunc](https://github.com/pipefunc/pipefunc)
13193

132-
A tool for executing graphs made out of functions. More focused on computational
133-
compared to workflow graphs.
94+
pipefunc is a lightweight library for creating function pipelines as directed acyclic
95+
graphs (DAGs) in pure Python. It automatically handles execution order, supports
96+
map-reduce operations, parallel execution, and provides resource profiling.
97+
98+
## [Common Workflow Language (CWL)](https://www.commonwl.org/)
99+
100+
CWL is an open standard for describing data analysis workflows in a portable,
101+
language-agnostic format. Its primary goal is to enable workflows to be written once and
102+
executed across different computing environments—from local workstations to clusters,
103+
cloud, and HPC systems—without modification. Workflows described in CWL can be
104+
registered on [WorkflowHub](https://workflowhub.eu/) for sharing and discovery following
105+
FAIR (Findable, Accessible, Interoperable, Reusable) principles.
106+
107+
CWL is particularly prevalent in bioinformatics and life sciences where reproducibility
108+
across institutions is critical. Tools that support CWL include
109+
[cwltool](https://github.com/common-workflow-language/cwltool) (the reference
110+
implementation), [Toil](https://github.com/DataBiosphere/toil),
111+
[Arvados](https://arvados.org/), and [REANA](https://reanahub.io/). Some workflow
112+
systems like Snakemake and Nextflow can export workflows to CWL format.
113+
114+
pytask is not a CWL-compliant tool because it operates on a fundamentally different
115+
model. CWL describes workflows as graphs of command-line tool invocations where data
116+
flows between tools via files. pytask, in contrast, orchestrates Python functions that
117+
can execute arbitrary code, manipulate data in memory, call APIs, or perform any
118+
operation available in Python. This Python-native approach enables features like
119+
interactive debugging but means pytask workflows cannot be represented in CWL's
120+
command-line-centric specification.

justfile

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,17 +10,9 @@ test *FLAGS:
1010
test-cov *FLAGS:
1111
uv run --group test pytest --nbmake --cov=src --cov=tests --cov-report=xml -n auto {{FLAGS}}
1212

13-
# Run tests with notebook validation
14-
test-nb:
15-
uv run --group test pytest --nbmake -n auto
16-
1713
# Run type checking
1814
typing:
19-
uv run --group typing --no-dev --isolated mypy
20-
21-
# Run type checking on notebooks
22-
typing-nb:
23-
uv run --group typing --no-dev --isolated nbqa mypy --ignore-missing-imports .
15+
uv run --group typing --group test ty check src/ tests/
2416

2517
# Run linting
2618
lint:

pyproject.toml

Lines changed: 12 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,6 @@ name = "Tobias Raabe"
4747
email = "raabe@posteo.de"
4848

4949
[dependency-groups]
50-
dev = ["pygraphviz>=1.12;platform_system=='Linux'"]
5150
docs = [
5251
"furo>=2024.8.6",
5352
"ipython>=8.13.2",
@@ -65,6 +64,7 @@ docs = [
6564
]
6665
plugin-list = ["httpx>=0.27.0", "tabulate[widechars]>=0.9.0", "tqdm>=4.66.3"]
6766
test = [
67+
"cloudpickle>=3.0.0",
6868
"deepdiff>=7.0.0",
6969
# nbmake requires pywin32 on Windows, which has no wheels for Python 3.14 yet
7070
"nbmake>=1.5.5; platform_system != 'Windows' or python_version < '3.14'",
@@ -74,11 +74,11 @@ test = [
7474
"pytest-cov>=5.0.0",
7575
"pytest-xdist>=3.6.1",
7676
"syrupy>=4.5.0",
77-
"aiohttp>=3.11.0", # For HTTPPath tests.
77+
"aiohttp>=3.11.0", # For HTTPPath tests.
7878
"coiled>=1.42.0",
79-
"cloudpickle>=3.0.0",
79+
"pygraphviz>=1.12;platform_system=='Linux'",
8080
]
81-
typing = ["mypy>=1.11.0", "nbqa>=1.8.5"]
81+
typing = ["ty>=0.0.7"]
8282

8383
[project.urls]
8484
Changelog = "https://pytask-dev.readthedocs.io/en/stable/changes.html"
@@ -170,33 +170,14 @@ filterwarnings = [
170170
"ignore:'asyncio\\..*' is deprecated:DeprecationWarning",
171171
]
172172

173-
[tool.mypy]
174-
files = ["src", "tests"]
175-
check_untyped_defs = true
176-
disallow_any_generics = true
177-
disallow_incomplete_defs = true
178-
disallow_untyped_defs = true
179-
no_implicit_optional = true
180-
warn_redundant_casts = true
181-
warn_unused_ignores = true
182-
disable_error_code = ["import-untyped"]
183-
184-
[[tool.mypy.overrides]]
185-
module = "tests.*"
186-
disallow_untyped_defs = false
187-
ignore_errors = true
188-
189-
[[tool.mypy.overrides]]
190-
module = ["click_default_group", "networkx"]
191-
ignore_missing_imports = true
192-
193-
[[tool.mypy.overrides]]
194-
module = ["_pytask.coiled_utils"]
195-
disable_error_code = ["import-not-found"]
196-
197-
[[tool.mypy.overrides]]
198-
module = ["_pytask.hookspecs"]
199-
disable_error_code = ["empty-body"]
173+
[tool.ty.rules]
174+
unused-ignore-comment = "error"
175+
176+
[tool.ty.src]
177+
exclude = ["src/_pytask/_hashlib.py"]
178+
179+
[tool.ty.terminal]
180+
error-on-warning = true
200181

201182
[tool.coverage.report]
202183
exclude_also = [

src/_pytask/build.py

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
from typing import TYPE_CHECKING
1010
from typing import Any
1111
from typing import Literal
12+
from typing import cast
1213

1314
import click
1415

@@ -65,7 +66,7 @@ def pytask_unconfigure(session: Session) -> None:
6566
path.write_text(json.dumps(HashPathCache._cache))
6667

6768

68-
def build( # noqa: C901, PLR0912, PLR0913
69+
def build( # noqa: C901, PLR0912, PLR0913, PLR0915
6970
*,
7071
capture: Literal["fd", "no", "sys", "tee-sys"] | CaptureMethod = CaptureMethod.FD,
7172
check_casing_of_paths: bool = True,
@@ -230,10 +231,22 @@ def build( # noqa: C901, PLR0912, PLR0913
230231

231232
raw_config = {**DEFAULTS_FROM_CLI, **raw_config}
232233

233-
raw_config["paths"] = parse_paths(raw_config["paths"])
234+
paths_value = raw_config["paths"]
235+
# Convert tuple to list since parse_paths expects Path | list[Path]
236+
if isinstance(paths_value, tuple):
237+
paths_value = list(paths_value)
238+
if not isinstance(paths_value, (Path, list)):
239+
msg = f"paths must be Path or list, got {type(paths_value)}"
240+
raise TypeError(msg) # noqa: TRY301
241+
# Cast is justified - we validated at runtime
242+
raw_config["paths"] = parse_paths(cast("Path | list[Path]", paths_value))
234243

235244
if raw_config["config"] is not None:
236-
raw_config["config"] = Path(raw_config["config"]).resolve()
245+
config_value = raw_config["config"]
246+
if not isinstance(config_value, (str, Path)):
247+
msg = f"config must be str or Path, got {type(config_value)}"
248+
raise TypeError(msg) # noqa: TRY301
249+
raw_config["config"] = Path(config_value).resolve()
237250
raw_config["root"] = raw_config["config"].parent
238251
else:
239252
(

0 commit comments

Comments
 (0)