I'm trying to help a researcher speed up make_chains.py results for a mammal. Using this closed issue regarding parallelization, as inspiration, we'd like to speed up the process via our Slurm cluster running RHEL 8. I tried requesting a node via an interactive srun session and starting with 16 CPU with --ntasks and -c. Using --executor local as suggested in the closed thread was painfully slow. The user there mention --cluster_parameters but that results in: make_chains.py: error: unrecognized arguments: --cluster_parameters cpus=16
./make_chains.py MesAur_chr_folded mm10 /path/to/me/make_lastz_chains/MesAur_chr_folded.2bit /path/to/me/make_lastz_chains/mm10.2bit --pd test_out_1 -f --chaining_memory 16 --cluster_executor slurm
# Make Lastz Chains #
Version 2.0.8
Commit: 187e313afc10382fe44c96e47f27c4466d63e114
Branch: main
* found run_lastz.py at /path/to/me/make_lastz_chains/standalone_scripts/run_lastz.py
* found run_lastz_intermediate_layer.py at /path/to/me/make_lastz_chains/standalone_scripts/run_lastz_intermediate_layer.py
* found chain_gap_filler.py at /path/to/me/make_lastz_chains/standalone_scripts/chain_gap_filler.py
* found faToTwoBit at /cluster/opt/lastz/1.04.15/faToTwoBit
* found twoBitToFa at /cluster/opt/lastz/1.04.15/twoBitToFa
* found pslSortAcc at /cluster/opt/lastz/1.04.15/pslSortAcc
* found axtChain at /cluster/opt/lastz/1.04.15/axtChain
* found axtToPsl at /cluster/opt/lastz/1.04.15/axtToPsl
* found chainAntiRepeat at /cluster/opt/lastz/1.04.15/chainAntiRepeat
* found chainMergeSort at /cluster/opt/lastz/1.04.15/chainMergeSort
* found chainCleaner at /cluster/opt/lastz/1.04.15/chainCleaner
* found chainSort at /cluster/opt/lastz/1.04.15/chainSort
* found chainScore at /cluster/opt/lastz/1.04.15/chainScore
* found chainNet at /cluster/opt/lastz/1.04.15/chainNet
* found chainFilter at /cluster/opt/lastz/1.04.15/chainFilter
* found lastz at /cluster/opt/lastz/1.04.15/lastz
* found nextflow at /cluster/opt/nextflow/23.10.1/nextflow
All necessary executables found.
Making chains for /path/to/me/make_lastz_chains/MesAur_chr_folded.2bit and /path/to/me/make_lastz_chains/mm10.2bit files, saving results to /path/to/me/make_lastz_chains/test_out_1
Pipeline started at 2024-04-30 11:24:17.231861
* Setting up genome sequences for target
genomeID: MesAur_chr_folded
input sequence file: /path/to/me/make_lastz_chains/MesAur_chr_folded.2bit
is 2bit: True
planned genome dir location: /path/to/me/make_lastz_chains/test_out_1/target.2bit
Created symlink from /path/to/me/make_lastz_chains/MesAur_chr_folded.2bit to /path/to/me/make_lastz_chains/test_out_1/target.2bit
For MesAur_chr_folded (target) sequence file: /path/to/me/make_lastz_chains/test_out_1/target.2bit; chrom sizes saved to: /path/to/me/make_lastz_chains/test_out_1/target.chrom.sizes
* Setting up genome sequences for query
genomeID: mm10
input sequence file: /path/to/me/make_lastz_chains/mm10.2bit
is 2bit: True
planned genome dir location: /path/to/me/make_lastz_chains/test_out_1/query.2bit
Created symlink from /path/to/me/make_lastz_chains/mm10.2bit to /path/to/me/make_lastz_chains/test_out_1/query.2bit
For mm10 (query) sequence file: /path/to/me/make_lastz_chains/test_out_1/query.2bit; chrom sizes saved to: /path/to/me/make_lastz_chains/test_out_1/query.chrom.sizes
### Partition Step ###
# Partitioning for target
Saving partitions and creating 238 buckets for lastz output
In particular, 19 partitions for bigger chromosomes
And 219 buckets for smaller scaffolds
Saving target partitions to: /path/to/me/make_lastz_chains/test_out_1/target_partitions.txt
# Partitioning for query
Saving partitions and creating 65 buckets for lastz output
In particular, 64 partitions for bigger chromosomes
And 1 buckets for smaller scaffolds
Saving query partitions to: /path/to/me/make_lastz_chains/test_out_1/query_partitions.txt
Num. target partitions: 19
Num. query partitions: 64
Num. lastz jobs: 1216
### Lastz Alignment Step ###
LASTZ: making jobs
LASTZ: saved 15470 jobs to /path/to/me/make_lastz_chains/test_out_1/temp_lastz_run/lastz_joblist.txt
Parallel manager: pushing job /cluster/opt/nextflow/23.10.1/nextflow /path/to/me/make_lastz_chains/parallelization/execute_joblist.nf --joblist /path/to/me/make_lastz_chains/test_out_1/temp_lastz_run/lastz_joblist.txt -c /path/to/me/make_lastz_chains/test_out_1/temp_lastz_run/lastz_config.nf
N E X T F L O W ~ version 23.10.1
Launching `/path/to/me/make_lastz_chains/parallelization/execute_joblist.nf` [maniac_thompson] DSL2 - revision: 0483b29723
[84/955b71] process > execute_jobs (27) [ 0%] 28 of 3913, failed: 28, retries: 28
[c5/32a7bd] NOTE: Error submitting process 'execute_jobs (18)' for execution -- Execution is retried (1)
[26/dd5dc9] NOTE: Error submitting process 'execute_jobs (4)' for execution -- Execution is retried (1)
May I request assistance here to get the correct syntax?
P.S.. I can confirm the suggested shabang fix in this thread also works to start the sample jobs.
I'm trying to help a researcher speed up
make_chains.pyresults for a mammal. Using this closed issue regarding parallelization, as inspiration, we'd like to speed up the process via our Slurm cluster running RHEL 8. I tried requesting a node via an interactive srun session and starting with 16 CPU with--ntasksand-c. Using--executor localas suggested in the closed thread was painfully slow. The user there mention--cluster_parametersbut that results in:make_chains.py: error: unrecognized arguments: --cluster_parameters cpus=16May I request assistance here to get the correct syntax?
P.S.. I can confirm the suggested shabang fix in this thread also works to start the sample jobs.