Skip to content

Commit b88604f

Browse files
committed
Implement a portable lockfile.
1 parent 17e6a34 commit b88604f

18 files changed

Lines changed: 660 additions & 22 deletions

CHANGELOG.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,9 @@ releases are available on [PyPI](https://pypi.org/project/pytask) and
77

88
## Unreleased
99

10-
- Nothing yet.
10+
- {issue}`735` adds the `pytask.lock` lockfile as the primary state backend with a
11+
portable format, documentation, and a one-run SQLite fallback when no lockfile
12+
exists.
1113

1214
## 0.5.8 - 2025-12-30
1315

docs/source/how_to_guides/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ maxdepth: 1
1313
---
1414
migrating_from_scripts_to_pytask
1515
interfaces_for_dependencies_products
16+
portability
1617
remote_files
1718
functional_interface
1819
capture_warnings
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Portability
2+
3+
This guide explains how to keep pytask state portable across machines.
4+
5+
## Two Portability Concerns
6+
7+
1. **Portable IDs**
8+
9+
- The lockfile stores task and node IDs.
10+
- IDs must be project‑relative and stable across machines.
11+
- pytask builds these IDs from the project root; no action required for most users.
12+
13+
1. **Portable State Values**
14+
15+
- `state.value` is opaque and comes from `PNode.state()` / `PTask.state()`.
16+
- Content hashes are portable; timestamps or absolute paths are not.
17+
- Custom nodes should avoid machine‑specific paths in `state()`.
18+
19+
## Tips
20+
21+
- Commit `pytask.lock` to your repository. If you ship the repository together with the
22+
build artifacts (for example, a zipped project folder including `pytask.lock` and the
23+
produced files), you can move it to another machine and runs will skip recomputation.
24+
- Prefer file content hashes over timestamps for custom nodes.
25+
- For `PythonNode` values that are not natively stable, provide a custom hash function.
26+
- If inputs live outside the project root, IDs will include `..` segments to remain
27+
relative; this is expected.
28+
29+
## Legacy SQLite
30+
31+
SQLite is the old state format. It is used only when no lockfile exists, and the
32+
lockfile is written during that run. Subsequent runs rely on the lockfile.

docs/source/reference_guides/configuration.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -44,11 +44,12 @@ are welcome to also support macOS.
4444

4545
````{confval} database_url
4646
47-
pytask uses a database to keep track of tasks, products, and dependencies over runs. By
48-
default, it will create an SQLite database in the project's root directory called
49-
`.pytask/pytask.sqlite3`. If you want to use a different name or a different dialect
50-
[supported by sqlalchemy](https://docs.sqlalchemy.org/en/latest/core/engines.html#backend-specific-urls),
51-
use either {option}`pytask build --database-url` or `database_url` in the config.
47+
SQLite is the legacy state format. pytask now uses `pytask.lock` as the primary state
48+
backend and only consults the database when no lockfile exists. During that first run,
49+
the lockfile is written and subsequent runs use the lockfile only.
50+
51+
The `database_url` option remains for backwards compatibility and controls the legacy
52+
database location and dialect ([supported by sqlalchemy](https://docs.sqlalchemy.org/en/latest/core/engines.html#backend-specific-urls)).
5253
5354
```toml
5455
database_url = "sqlite:///.pytask/pytask.sqlite3"

docs/source/reference_guides/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ maxdepth: 1
99
---
1010
command_line_interface
1111
configuration
12+
lockfile
1213
hookspecs
1314
api
1415
```
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# The Lock File
2+
3+
`pytask.lock` is the default state backend. It stores task state in a portable,
4+
git-friendly format so runs can be resumed or shared across machines.
5+
6+
```{note}
7+
SQLite is the legacy format. It is still read when no lockfile exists, and a lockfile
8+
is written during that first run. Subsequent runs use the lockfile only.
9+
```
10+
11+
## Example
12+
13+
```toml
14+
# This file is automatically @generated by pytask.
15+
# It is not intended for manual editing.
16+
17+
lock-version = "1.0"
18+
19+
[[task]]
20+
id = "src/tasks/data.py::task_clean_data"
21+
22+
[task.state]
23+
value = "f9e8d7c6..."
24+
25+
[[task.depends_on]]
26+
id = "data/raw/input.csv"
27+
28+
[task.depends_on.state]
29+
value = "e5f6g7h8..."
30+
31+
[[task.produces]]
32+
id = "data/processed/clean.parquet"
33+
34+
[task.produces.state]
35+
value = "m3n4o5p6..."
36+
```
37+
38+
## Behavior
39+
40+
On each run, pytask:
41+
42+
1. Reads `pytask.lock` (if present).
43+
1. Compares current dependency/product/task `state()` to stored `state.value`.
44+
1. Skips tasks whose states match; runs the rest.
45+
1. Updates `pytask.lock` after each completed task (atomic write).
46+
47+
`pytask-parallel` uses a single coordinator to write the lock file, so writes are
48+
serialized even when tasks execute in parallel.
49+
50+
## Portability
51+
52+
There are two portability concerns:
53+
54+
1. **IDs**: Lockfile IDs must be project‑relative and stable across machines.
55+
1. **State values**: `state.value` is opaque; portability depends on each node’s
56+
`state()` implementation. Content hashes are portable; timestamps are not.
57+
58+
## File Format Reference
59+
60+
### Top-Level
61+
62+
| Field | Required | Description |
63+
| -------------- | -------- | ---------------------------------- |
64+
| `lock-version` | Yes | Schema version (currently `"1.0"`) |
65+
66+
### Task Entry
67+
68+
| Field | Required | Description |
69+
| ------- | -------- | -------------------------------------------- |
70+
| `id` | Yes | Portable task identifier |
71+
| `state` | Yes | State dictionary with a single `value` field |
72+
73+
### Dependency/Product Entry
74+
75+
| Field | Required | Description |
76+
| ------- | -------- | -------------------------------------------- |
77+
| `id` | Yes | Node identifier |
78+
| `state` | Yes | State dictionary with a single `value` field |
79+
80+
### State Dictionary
81+
82+
| Field | Required | Description |
83+
| ------- | -------- | ------------------- |
84+
| `value` | Yes | Opaque state string |
85+
86+
## Version Compatibility
87+
88+
- **Upgrade**: newer pytask upgrades old lock files in memory and writes the new format
89+
on the next update.
90+
- **Downgrade**: older pytask errors with a clear upgrade message.

docs/source/tutorials/making_tasks_persist.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ In this case, you can apply the {func}`@pytask.mark.persist <pytask.mark.persist
99
decorator to the task, which will skip its execution as long as all products exist.
1010

1111
Internally, the state of the dependencies, the source file, and the products are updated
12-
in the database such that the subsequent execution will skip the task successfully.
12+
in the lockfile such that the subsequent execution will skip the task successfully.
1313

1414
## When is this useful?
1515

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ dependencies = [
3030
"pluggy>=1.3.0",
3131
"rich>=13.8.0",
3232
"sqlalchemy>=2.0.31",
33+
"msgspec[toml]>=0.18.6",
3334
'tomli>=1; python_version < "3.11"',
3435
'typing-extensions>=4.8.0; python_version < "3.11"',
3536
"universal-pathlib>=0.2.2",

src/_pytask/console.py

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -111,10 +111,26 @@ def render_to_string(
111111
example, render warnings with colors or text in exceptions.
112112
113113
"""
114-
buffer = console.render(renderable)
114+
render_console = console
115+
if not strip_styles and console.no_color and console.color_system is not None:
116+
theme: Theme | None
117+
try:
118+
theme = Theme(console._theme_stack._entries[-1]) # type: ignore[attr-defined]
119+
except (AttributeError, IndexError, TypeError):
120+
theme = None
121+
render_console = Console(
122+
color_system=console.color_system,
123+
force_terminal=True,
124+
width=console.width,
125+
no_color=False,
126+
markup=getattr(console, "_markup", True),
127+
theme=theme,
128+
)
129+
130+
buffer = render_console.render(renderable)
115131
if strip_styles:
116132
buffer = Segment.strip_styles(buffer)
117-
return console._render_buffer(buffer)
133+
return render_console._render_buffer(buffer)
118134

119135

120136
def format_task_name(task: PTask, editor_url_scheme: str) -> Text:

src/_pytask/execute.py

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,6 @@
2020
from _pytask.dag_utils import TopologicalSorter
2121
from _pytask.dag_utils import descending_tasks
2222
from _pytask.dag_utils import node_and_neighbors
23-
from _pytask.database_utils import get_node_change_info
24-
from _pytask.database_utils import has_node_changed
25-
from _pytask.database_utils import update_states_in_database
2623
from _pytask.exceptions import ExecutionError
2724
from _pytask.exceptions import NodeLoadError
2825
from _pytask.exceptions import NodeNotFoundError
@@ -46,6 +43,9 @@
4643
from _pytask.pluginmanager import hookimpl
4744
from _pytask.provisional_utils import collect_provisional_products
4845
from _pytask.reports import ExecutionReport
46+
from _pytask.state import get_node_change_info
47+
from _pytask.state import has_node_changed
48+
from _pytask.state import update_states
4949
from _pytask.traceback import remove_traceback_from_exc_info
5050
from _pytask.tree_util import tree_leaves
5151
from _pytask.tree_util import tree_map
@@ -196,7 +196,7 @@ def pytask_execute_task_setup(session: Session, task: PTask) -> None: # noqa: C
196196
# Check if node changed and collect detailed info if in explain mode
197197
if session.config["explain"]:
198198
has_changed, reason, details = get_node_change_info(
199-
task=task, node=node, state=node_state
199+
session=session, task=task, node=node, state=node_state
200200
)
201201
if has_changed:
202202
needs_to_be_executed = True
@@ -222,7 +222,9 @@ def pytask_execute_task_setup(session: Session, task: PTask) -> None: # noqa: C
222222
)
223223
)
224224
else:
225-
has_changed = has_node_changed(task=task, node=node, state=node_state)
225+
has_changed = has_node_changed(
226+
session=session, task=task, node=node, state=node_state
227+
)
226228
if has_changed:
227229
needs_to_be_executed = True
228230

@@ -232,6 +234,8 @@ def pytask_execute_task_setup(session: Session, task: PTask) -> None: # noqa: C
232234

233235
if not needs_to_be_executed:
234236
collect_provisional_products(session, task)
237+
if not session.config["dry_run"] and not session.config["explain"]:
238+
update_states(session, task)
235239
raise SkippedUnchanged
236240

237241
# Create directory for product if it does not exist. Maybe this should be a `setup`
@@ -326,7 +330,7 @@ def pytask_execute_task_process_report(
326330
task = report.task
327331

328332
if report.outcome == TaskOutcome.SUCCESS:
329-
update_states_in_database(session, task.signature)
333+
update_states(session, task)
330334
elif report.exc_info and isinstance(report.exc_info[1], WouldBeExecuted):
331335
report.outcome = TaskOutcome.WOULD_BE_EXECUTED
332336

0 commit comments

Comments
 (0)