Skip to content

speeding up/parallelizing make_chains.py with Slurm: NOTE: Error submitting process 'execute_jobs (##)' for execution -- Execution is retried  #58

@LinuxPersonEC

Description

@LinuxPersonEC

I'm trying to help a researcher speed up make_chains.py results for a mammal. Using this closed issue regarding parallelization, as inspiration, we'd like to speed up the process via our Slurm cluster running RHEL 8. I tried requesting a node via an interactive srun session and starting with 16 CPU with --ntasks and -c. Using --executor local as suggested in the closed thread was painfully slow. The user there mention --cluster_parameters but that results in: make_chains.py: error: unrecognized arguments: --cluster_parameters cpus=16

./make_chains.py MesAur_chr_folded mm10  /path/to/me/make_lastz_chains/MesAur_chr_folded.2bit /path/to/me/make_lastz_chains/mm10.2bit --pd test_out_1 -f --chaining_memory 16   --cluster_executor slurm 
# Make Lastz Chains #
Version 2.0.8
Commit: 187e313afc10382fe44c96e47f27c4466d63e114
Branch: main

* found run_lastz.py at /path/to/me/make_lastz_chains/standalone_scripts/run_lastz.py
* found run_lastz_intermediate_layer.py at /path/to/me/make_lastz_chains/standalone_scripts/run_lastz_intermediate_layer.py
* found chain_gap_filler.py at /path/to/me/make_lastz_chains/standalone_scripts/chain_gap_filler.py
* found faToTwoBit at /cluster/opt/lastz/1.04.15/faToTwoBit
* found twoBitToFa at /cluster/opt/lastz/1.04.15/twoBitToFa
* found pslSortAcc at /cluster/opt/lastz/1.04.15/pslSortAcc
* found axtChain at /cluster/opt/lastz/1.04.15/axtChain
* found axtToPsl at /cluster/opt/lastz/1.04.15/axtToPsl
* found chainAntiRepeat at /cluster/opt/lastz/1.04.15/chainAntiRepeat
* found chainMergeSort at /cluster/opt/lastz/1.04.15/chainMergeSort
* found chainCleaner at /cluster/opt/lastz/1.04.15/chainCleaner
* found chainSort at /cluster/opt/lastz/1.04.15/chainSort
* found chainScore at /cluster/opt/lastz/1.04.15/chainScore
* found chainNet at /cluster/opt/lastz/1.04.15/chainNet
* found chainFilter at /cluster/opt/lastz/1.04.15/chainFilter
* found lastz at /cluster/opt/lastz/1.04.15/lastz
* found nextflow at /cluster/opt/nextflow/23.10.1/nextflow
All necessary executables found.
Making chains for /path/to/me/make_lastz_chains/MesAur_chr_folded.2bit and /path/to/me/make_lastz_chains/mm10.2bit files, saving results to /path/to/me/make_lastz_chains/test_out_1
Pipeline started at 2024-04-30 11:24:17.231861
* Setting up genome sequences for target
genomeID: MesAur_chr_folded
input sequence file: /path/to/me/make_lastz_chains/MesAur_chr_folded.2bit
is 2bit: True
planned genome dir location: /path/to/me/make_lastz_chains/test_out_1/target.2bit
Created symlink from /path/to/me/make_lastz_chains/MesAur_chr_folded.2bit to /path/to/me/make_lastz_chains/test_out_1/target.2bit
For MesAur_chr_folded (target) sequence file: /path/to/me/make_lastz_chains/test_out_1/target.2bit; chrom sizes saved to: /path/to/me/make_lastz_chains/test_out_1/target.chrom.sizes
* Setting up genome sequences for query
genomeID: mm10
input sequence file: /path/to/me/make_lastz_chains/mm10.2bit
is 2bit: True
planned genome dir location: /path/to/me/make_lastz_chains/test_out_1/query.2bit
Created symlink from /path/to/me/make_lastz_chains/mm10.2bit to /path/to/me/make_lastz_chains/test_out_1/query.2bit
For mm10 (query) sequence file: /path/to/me/make_lastz_chains/test_out_1/query.2bit; chrom sizes saved to: /path/to/me/make_lastz_chains/test_out_1/query.chrom.sizes

### Partition Step ###

# Partitioning for target
Saving partitions and creating 238 buckets for lastz output
In particular, 19 partitions for bigger chromosomes
And 219 buckets for smaller scaffolds
Saving target partitions to: /path/to/me/make_lastz_chains/test_out_1/target_partitions.txt
# Partitioning for query
Saving partitions and creating 65 buckets for lastz output
In particular, 64 partitions for bigger chromosomes
And 1 buckets for smaller scaffolds
Saving query partitions to: /path/to/me/make_lastz_chains/test_out_1/query_partitions.txt
Num. target partitions: 19
Num. query partitions: 64
Num. lastz jobs: 1216

### Lastz Alignment Step ###

LASTZ: making jobs
LASTZ: saved 15470 jobs to /path/to/me/make_lastz_chains/test_out_1/temp_lastz_run/lastz_joblist.txt
Parallel manager: pushing job /cluster/opt/nextflow/23.10.1/nextflow /path/to/me/make_lastz_chains/parallelization/execute_joblist.nf --joblist /path/to/me/make_lastz_chains/test_out_1/temp_lastz_run/lastz_joblist.txt -c /path/to/me/make_lastz_chains/test_out_1/temp_lastz_run/lastz_config.nf
N E X T F L O W  ~  version 23.10.1
Launching `/path/to/me/make_lastz_chains/parallelization/execute_joblist.nf` [maniac_thompson] DSL2 - revision: 0483b29723
[84/955b71] process > execute_jobs (27) [  0%] 28 of 3913, failed: 28, retries: 28
[c5/32a7bd] NOTE: Error submitting process 'execute_jobs (18)' for execution -- Execution is retried (1)
[26/dd5dc9] NOTE: Error submitting process 'execute_jobs (4)' for execution -- Execution is retried (1)

May I request assistance here to get the correct syntax?

P.S.. I can confirm the suggested shabang fix in this thread also works to start the sample jobs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions