Summary
JOB_TEMPLATE in src/hpc/job.py currently sets the job's working
directory via a shell-level cd $job_workdir line in the rendered
script. Replacing this with a scheduler directive — Slurm's
#SBATCH --chdir= (a.k.a. -D), PJM's -d / --directory — would
move the working-directory choice from the script body into the
scheduler API.
This is a deferred design question, not a known bug. Filing for
visibility so it can be revisited together with related directory-
model decisions.
Robustness motivation
A shell-level cd has known failure modes that a scheduler
directive avoids:
- Silent wrong-CWD execution. If
$job_workdir does not exist
or is not accessible at runtime, cd fails without exiting
the script (default bash semantics). Subsequent commands then
run in $HOME or $SLURM_SUBMIT_DIR, the job completes
successfully from the scheduler's perspective, and the user
discovers the wrong directory only by output inspection.
--chdir is validated by the scheduler at submit / job-start
time, so an invalid path fails at submit instead of producing a
silently-wrong run.
- Implicit "before the cd" zone. Any shell line emitted before
cd runs in the submit-time CWD. The current template happens to
have no such lines, but the contract depends on never adding any.
--chdir makes the entire script body run in $job_workdir
unconditionally.
- Scheduler-side visibility.
scontrol show job <id> reports
WorkDir= only when set via --chdir. With cd, the working
directory is hidden inside the script body and not queryable
from scheduler metadata.
- Requeue / preemption. A
--chdir-set working directory is
recorded by the scheduler and restored automatically on requeue.
With cd, the script must re-execute the line; same end result
in the simple case, but the scheduler-level path is more
scheduler-aware and removes one shell pre-condition.
Why this is not a single-PR fix
The change interacts with a longer-running directory-model
question — whether to introduce a sourcedir concept distinct from
cluster.workdir. Once a second remote-side directory exists, the
following questions all need consistent answers:
- Should
--chdir point at workdir or sourcedir?
- A user's own
cd subdir in the script — is subdir resolved
relative to workdir, sourcedir, or whichever directory the
scheduler placed the job in?
- The CWD-relative walker (
cwd_relative = Path.cwd().relative_to(project_root) in src/hpc/cli.py) maps
local CWD to a remote subpath. Which root does that subpath
attach to?
[env] setup commands — do they run before or after the
scheduler's chdir takes effect, and does this matter for paths
used in module load / source / export?
Each combination of answers produces a different mental model for
users. Introducing --chdir standalone risks locking in a partial
answer that constrains the later sourcedir design.
Acceptance
This issue is closed when:
- A directory-model design has been decided (workdir-only, or
workdir + sourcedir with documented semantics for each axis
above), AND
JOB_TEMPLATE has been updated to use the scheduler directive
(#SBATCH --chdir= for Slurm, -d for PJM) consistent with
that design, AND
- The walker logic and
[env] setup interaction are verified
against the new model.
Closing without addressing the directory-model question is not
acceptable; the standalone refactor would lock in implicit
answers.
References
Summary
JOB_TEMPLATEinsrc/hpc/job.pycurrently sets the job's workingdirectory via a shell-level
cd $job_workdirline in the renderedscript. Replacing this with a scheduler directive — Slurm's
#SBATCH --chdir=(a.k.a.-D), PJM's-d/--directory— wouldmove the working-directory choice from the script body into the
scheduler API.
This is a deferred design question, not a known bug. Filing for
visibility so it can be revisited together with related directory-
model decisions.
Robustness motivation
A shell-level
cdhas known failure modes that a schedulerdirective avoids:
$job_workdirdoes not existor is not accessible at runtime,
cdfails without exitingthe script (default bash semantics). Subsequent commands then
run in
$HOMEor$SLURM_SUBMIT_DIR, the job completessuccessfully from the scheduler's perspective, and the user
discovers the wrong directory only by output inspection.
--chdiris validated by the scheduler at submit / job-starttime, so an invalid path fails at submit instead of producing a
silently-wrong run.
cdruns in the submit-time CWD. The current template happens tohave no such lines, but the contract depends on never adding any.
--chdirmakes the entire script body run in$job_workdirunconditionally.
scontrol show job <id>reportsWorkDir=only when set via--chdir. Withcd, the workingdirectory is hidden inside the script body and not queryable
from scheduler metadata.
--chdir-set working directory isrecorded by the scheduler and restored automatically on requeue.
With
cd, the script must re-execute the line; same end resultin the simple case, but the scheduler-level path is more
scheduler-aware and removes one shell pre-condition.
Why this is not a single-PR fix
The change interacts with a longer-running directory-model
question — whether to introduce a
sourcedirconcept distinct fromcluster.workdir. Once a second remote-side directory exists, thefollowing questions all need consistent answers:
--chdirpoint at workdir or sourcedir?cd subdirin the script — issubdirresolvedrelative to workdir, sourcedir, or whichever directory the
scheduler placed the job in?
cwd_relative = Path.cwd().relative_to(project_root)insrc/hpc/cli.py) mapslocal CWD to a remote subpath. Which root does that subpath
attach to?
[env]setup commands — do they run before or after thescheduler's chdir takes effect, and does this matter for paths
used in
module load/source/export?Each combination of answers produces a different mental model for
users. Introducing
--chdirstandalone risks locking in a partialanswer that constrains the later sourcedir design.
Acceptance
This issue is closed when:
workdir + sourcedir with documented semantics for each axis
above), AND
JOB_TEMPLATEhas been updated to use the scheduler directive(
#SBATCH --chdir=for Slurm,-dfor PJM) consistent withthat design, AND
[env]setup interaction are verifiedagainst the new model.
Closing without addressing the directory-model question is not
acceptable; the standalone refactor would lock in implicit
answers.
References
fix (#SBATCH directives in submitted script are silently ignored #6 / PR fix(submit): honor #SBATCH/#PJM directives in user-supplied scripts #11) as a deferred item.
sbatch(1)documents--chdir/-D.pjsubdocuments-d/--directory.