Skip to content

submit: JOB_TEMPLATE emits Slurm-form --output= / --error= directives for PJM jobs #17

@ultimatile

Description

@ultimatile

Summary

JOB_TEMPLATE in src/hpc/job.py injects scheduler-bookkeeping
output and error directives in Slurm form
(--output=..., --error=...) for both Slurm and PJM jobs.
PJM (pjsub) uses -o <path> and -e <path> instead, so the
emitted lines are not valid PJM directives.

Repro

hpc.toml with [cluster] scheduler = "pjm", then
hpc submit "echo hi" and inspect the rendered job script
(uploaded under <workdir>/.hpc/runs/<run_id>/job.sh):

#PJM -L node=12
...
#PJM --output=/.../job-%j.out
#PJM --error=/.../job-%j.err

The last two lines are Slurm-style; PJM's pjsub does not honor
them. Job stdout / stderr therefore land at PJM's default
locations rather than under
<workdir>/.hpc/runs/<run_id>/job-<job_id>.out, which breaks
hpc job-output <run_id> (it reads from the hpc-tracked path).

Expected

For PJM, the bookkeeping lines should be emitted as #PJM -o <path>
and #PJM -e <path>, with the path / wildcard substitution
adjusted to PJM conventions
(%j is Slurm's; PJM uses %j in some configurations and the
job-id substitution differs across PJM versions — needs
investigation).

Notes

  • This was identified during research for a separate issue but kept
    out of scope to keep that fix minimal.
  • The fix likely needs a per-scheduler hook for emitting output /
    error directives, since the syntax and substitution conventions
    diverge.
  • hpc job-output resolution
    (get_job_output in src/hpc/job.py) assumes the hpc-controlled
    path layout, so this needs to stay aligned with whatever PJM
    actually writes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions