Skip to content

SGE (via qrsh): addprocs_qrsh() fails on cluster that supports qrsh #141

@jtrakk

Description

@jtrakk

When I use addprocs_qrsh() I get an error message and no jobs are created (checked in qstat).

ClusterManagers.addprocs_qrsh(3,res_list="h_rt=2:00:00,h_data=4G,highp")
Error launching workers
MethodError(iterate, (Process(`qrsh -l h_rt=2:00:00,h_data=4G,highp -V -N julia-13730 -now n cd /mydir '&&' /u/local/apps/julia/1.5.1/bin/julia --worker=2BuUs4aIkAHENSDE`, ProcessRunning),), 0x0000000000006caf)
Int64[]

My cluster does support qrsh. When I try to run the qrsh command manually in a shell, it produces these messages about host key, but does seem to allocate the worker, as I can see it in qstat.

qrsh -l h_rt=2:00:00,h_data=4G,highp -V -N julia-13730 -now n cd /mydir '&&' /u/local/apps/julia/1.5.1/bin/julia --worker=2BuUs4aIkAHENSDE
could not open any host key
ssh_keysign: no reply
key_sign failed
julia_worker:9934

job-ID     prior   name       user         state submit/start at     queue                          jclass                         slots ja-task-ID 
------------------------------------------------------------------------------------------------------------------------------------------------
   4514401 0.50500 QRLOGIN    user         r     09/02/2020 00:05:14 my.q@nodexxx                                                  2        

When I use addprocs_sge() it works just fine.


This looks like the same issue as this comment but opened a new issue as that one was originally opened for a different purpose.

Julia 1.5.1
ClusterManagers.jl master branch dde400e

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions