|
1 | 1 | # Portability |
2 | 2 |
|
3 | | -This guide explains how to keep pytask state portable across machines. |
| 3 | +This guide explains what you need to do to move a pytask project between machines and |
| 4 | +why the lockfile is central to that process. |
4 | 5 |
|
5 | | -## Two Portability Concerns |
| 6 | +```{seealso} |
| 7 | +The lockfile format and behavior are documented in the |
| 8 | +[reference guide](../reference_guides/lockfile.md). |
| 9 | +``` |
| 10 | + |
| 11 | +## What makes a project portable |
| 12 | + |
| 13 | +There are two things that must stay stable across machines: |
| 14 | + |
| 15 | +First, task and node IDs must be stable. An ID is the unique identifier that ties a task |
| 16 | +or node to an entry in `pytask.lock`. pytask builds these IDs from project-relative |
| 17 | +paths anchored at the project root, so most users do not need to do anything. If you |
| 18 | +implement custom nodes, make sure their IDs remain project-relative and stable across |
| 19 | +machines. |
| 20 | + |
| 21 | +Second, state values must be portable. The lockfile stores opaque state strings from |
| 22 | +`PNode.state()` and `PTask.state()`, and pytask uses them to decide whether a task is up |
| 23 | +to date. Content hashes are portable; timestamps or absolute paths are not. This mostly |
| 24 | +matters when you define custom nodes or custom hash functions. |
| 25 | + |
| 26 | +## How to port a project |
| 27 | + |
| 28 | +Use this checklist when you move a project to another machine or environment. |
| 29 | + |
| 30 | +1. **Update state once on the source machine.** |
| 31 | + |
| 32 | + Run a normal build so `pytask.lock` is up to date: |
| 33 | + |
| 34 | + ```console |
| 35 | + $ pytask build |
| 36 | + ``` |
| 37 | + |
| 38 | + If you already have a recent lockfile and up-to-date outputs, you can skip this step. |
| 39 | + If the lockfile has stale entries, clean it first: |
6 | 40 |
|
7 | | -1. **Portable IDs** |
| 41 | + ```console |
| 42 | + $ pytask build --clean-lockfile |
| 43 | + ``` |
8 | 44 |
|
9 | | - - The lockfile stores task and node IDs. |
10 | | - - IDs must be project‑relative and stable across machines. |
11 | | - - pytask builds these IDs from the project root; no action required for most users. |
| 45 | +1. **Ship the right files.** |
12 | 46 |
|
13 | | -1. **Portable State Values** |
| 47 | + Commit `pytask.lock` to your repository and move it with the project. In practice, |
| 48 | + you should move: |
14 | 49 |
|
15 | | - - `state` is opaque and comes from `PNode.state()` / `PTask.state()`. |
16 | | - - Content hashes are portable; timestamps or absolute paths are not. |
17 | | - - Custom nodes should avoid machine‑specific paths in `state()`. |
| 50 | + - the project files tracked in version control (source, configuration, data inputs) |
| 51 | + - `pytask.lock` |
| 52 | + - the build artifacts you want to reuse (often in `bld/` if you follow the tutorial |
| 53 | + layout) |
18 | 54 |
|
19 | | -## Tips |
| 55 | + The lockfile does not contain the artifacts themselves. If you move only the lockfile |
| 56 | + but not the outputs, pytask will re-run tasks because output states will not match. |
| 57 | + |
| 58 | +1. **Preserve relative paths.** |
| 59 | + |
| 60 | + IDs are project-relative. If your dependencies or products live outside the project |
| 61 | + root, their IDs include `..` segments. Make sure the same relative layout exists on |
| 62 | + the target machine, or update the paths and run `pytask build` once to refresh |
| 63 | + `pytask.lock`. |
| 64 | + |
| 65 | +1. **Run pytask on the target machine.** |
| 66 | + |
| 67 | + When states match, tasks are skipped. When they differ, tasks run and the lockfile is |
| 68 | + updated. |
| 69 | + |
| 70 | +## Tips for stable state values |
20 | 71 |
|
21 | | -- Commit `pytask.lock` to your repository. If you ship the repository together with the |
22 | | - build artifacts (for example, a zipped project folder including `pytask.lock` and the |
23 | | - produced files), you can move it to another machine and runs will skip recomputation. |
24 | 72 | - Prefer file content hashes over timestamps for custom nodes. |
25 | 73 | - For `PythonNode` values that are not natively stable, provide a custom hash function. |
26 | | -- If inputs live outside the project root, IDs will include `..` segments to remain |
27 | | - relative; this is expected. |
| 74 | +- Avoid machine-specific paths or timestamps in custom `state()` implementations. |
28 | 75 |
|
29 | | -## Cleaning Up the Lockfile |
| 76 | +```{seealso} |
| 77 | +For custom nodes, see [Writing custom nodes](writing_custom_nodes.md). |
| 78 | +For hashing guidance, see |
| 79 | +[Hashing inputs of tasks](hashing_inputs_of_tasks.md). |
| 80 | +``` |
| 81 | + |
| 82 | +## Cleaning up the lockfile |
30 | 83 |
|
31 | 84 | `pytask.lock` is updated incrementally. Entries are only replaced when the corresponding |
32 | 85 | tasks run. If tasks are removed or renamed, their old entries remain as stale data and |
33 | 86 | are ignored. |
34 | 87 |
|
35 | 88 | To clean up stale entries without deleting the file, run: |
36 | 89 |
|
37 | | -``` |
38 | | -pytask build --clean-lockfile |
| 90 | +```console |
| 91 | +$ pytask build --clean-lockfile |
39 | 92 | ``` |
40 | 93 |
|
41 | 94 | This rewrites the lockfile after a successful build with only the currently collected |
42 | 95 | tasks and their current state values. |
43 | | - |
44 | | -## Legacy SQLite |
45 | | - |
46 | | -SQLite is the old state format. It is used only when no lockfile exists, and the |
47 | | -lockfile is written during that run. Subsequent runs rely on the lockfile and do not |
48 | | -update database state. |
|
0 commit comments