nf-core refactor by NilaBlueshirt · Pull Request #109 · hillerlab/make_lastz_chains

NilaBlueshirt · 2026-04-01T05:05:58Z

nf-core DSL2 refactor: Nextflow pipeline with containers and checkpoints, for Slurm HPC

Summary

Refactors make_lastz_chains from a Python-orchestrated pipeline into a standard nf-core-style DSL2 Nextflow pipeline. The original make_chains.py entry point is fully preserved.

What's new

Details can be found in Changelong.md, CHANGES_nfcore_refactor.md, and TODO.md

nf-core DSL2 pipeline
Single nextflow.config for all infrastructure settings, params.json for scientific parameters
SLURM job arrays
Checkpoint entry points
Dockerfile: full UCSC Kent distribution. The Docker image has been uploaded to Docker Hub and tested.
Support large input: added -long for farToTwoBit, replaces twobitreader to support 64-bit .2bit files for large genomes (fixes Alignment of a large genome in 2bit #56)
environment.yml — conda/mamba environment with all tools for the conda profile (we can add pip to it, but uv doesn't work well on HPC)

Key files

File	Description
`main.nf`	nf-core entry point with three workflow aliases
`nextflow.config`	Unified infrastructure config
`params.json`	Scientific parameter template — edit for each run
`Dockerfile`	Container image definition
`environment.yml`	Conda environment
`modules/local/*/main.nf`	One process per pipeline step
`subworkflows/local/*/main.nf`	Four subworkflows
`workflows/make_lastz_chains.nf`	Main workflow
`bin/`	Helper scripts (partition, psl_bundle, split_chains)

Backward compatibility

make_chains.py and all original Python modules are unchanged. The pipeline can still be executed as v2.0.8.

Usage

Check the README.md file for 3 different use cases.

Hello, I ran into the "error while loading shared libraries: libiconv.so.2: cannot open shared object file: No such file or directory" error when running chainCleaner manually. I was able to resolve this error by adding the "libiconv" package into my existing conda env. I found it useful to have this yml file on hand.

Merge remote-tracking branch 'upstream/main'

- Rewrote pipeline from Python orchestration to native Nextflow DSL2 modules and subworkflows - Replaced twobitreader with py2bit to support 64-bit .2bit files (fixes issue hillerlab#56) - Added single Docker/Apptainer container with full UCSC Kent distribution - Scientific parameters split into params.json; nextflow.config is infrastructure-only - Added FROM_FILL_CHAINS and FROM_CLEAN_CHAINS checkpoint entry workflows - Added params.json template and run_nf_slurm_example.sh for multi-pair SLURM runs

- Compute tiers: process_fast (16GB/0.5h), process_medium (50GB/2h) - FA_TO_TWO_BIT raised to 50GB; CHAIN_CLEANER 80GB; REPEAT_FILLER 1h - SLURM partition routing: <=4h to htc, >4h to public - Container image controlled via NXF_CONTAINER_IMAGE env var - LASTZ/AXT_CHAIN/REPEAT_FILLER submit as SLURM job arrays - Added SLURM_SKIP_EPILOG, USR2@180 signal, and beforeScript sleep for LASTZ

- Replace !in with .contains() (invalid Groovy in NF 25.x) - Move subworkflow alias from call site to include statement - Fix DuplicateProcessInvocation: PARTITION -> PARTITION_TARGET / PARTITION_QUERY - Fix all module labels (process_single/high -> process_fast/medium) - Add run_lastz.py and run_lastz_intermediate_layer.py to bin/ for automatic Nextflow staging - Set plain errorStrategy = retry globally - README: three standalone sections, single-source info, Nextflow >= 25.04.6 - Add Changelog v3.0.0 and parameter audit tables to CHANGES_nfcore_refactor.md

…, writer side) The reader fix (twobitreader → py2bit) was already in place, but faToTwoBit was still writing version-0 (32-bit) .2bit files, causing an index overflow abort on genomes with sequences larger than 4 GB. Adding -long writes version-1 (64-bit) files that py2bit can then read correctly. Updated CHANGES_nfcore_refactor.md to document both halves of the issue hillerlab#56 fix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…enomes (issue hillerlab#56, reader side) lastz cannot read v1 (64-bit) .2bit files produced by faToTwoBit -long. run_lastz.py now uses py2bit to extract each partition to a temp FASTA in the task work directory before passing it to lastz. Temp files default to CWD (Nextflow work dir) instead of /tmp, avoiding cross-node filesystem issues on SLURM. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ccess workflow.onComplete now checks if the final chain file actually exists before logging its path. Prints a warning if the pipeline completed successfully but produced no output (e.g. all LASTZ jobs yielded no alignments). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ssue hillerlab#56) py2bit does not actually support v1/64-bit .2bit files produced by faToTwoBit -long, causing RuntimeError on large genomes (>4 GB). Use twoBitToFa (UCSC CLI, already a pipeline dependency) to extract partitions to temp FASTA instead — it supports both v0 and v1 files. Removes py2bit and twobitreader Python dependencies entirely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- project_setup_procedures.py: replace TwoBitFile with twoBitInfo subprocess calls for chrom name listing, chrom sizes, and .2bit format detection - Dockerfile: remove py2bit pip install; no Python .2bit library needed - CHANGES_nfcore_refactor.md: document the full scope of the fix - TODO.md: update done item to reflect actual approach Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

NilaBlueshirt · 2026-04-07T21:43:45Z

Good news, we have two test runs that went through successfully using this fork, we are verifying the results against the main branch. But I'm not sure what else to do besides comparing the final result files, what else would you like to check?

alejandrogzi · 2026-04-09T09:19:52Z

Hey @NilaBlueshirt

I'd say the easiest would be comparing the final chains but if we want to be more specific i would compare all the outputs from the core alignment (lastz), then the repeat filling and the final output.

NilaBlueshirt and others added 2 commits May 29, 2025 09:15

update the fork main with the upstream main in hiller's lab

b3ce065

Merge remote-tracking branch 'upstream/main'

NilaBlueshirt changed the title ~~Nf core refactor~~ nf-core refactor Apr 1, 2026

Nil Mu added 3 commits April 1, 2026 17:36

NilaBlueshirt force-pushed the nf-core-refactor branch from 462b18e to 96b6511 Compare April 2, 2026 00:36

Nil Mu and others added 3 commits April 2, 2026 10:03

fix config

9ca2c06

fix: add chain_gap_filler.py to bin/ so Nextflow puts it on PATH

f2f94fb

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

alejandrogzi self-assigned this Apr 3, 2026

Nil Mu and others added 4 commits April 5, 2026 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nf-core refactor#109

nf-core refactor#109
NilaBlueshirt wants to merge 12 commits intohillerlab:mainfrom
NilaBlueshirt:nf-core-refactor

NilaBlueshirt commented Apr 1, 2026 •

edited

Loading

Uh oh!

NilaBlueshirt commented Apr 7, 2026

Uh oh!

alejandrogzi commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NilaBlueshirt commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

nf-core DSL2 refactor: Nextflow pipeline with containers and checkpoints, for Slurm HPC

Summary

What's new

Key files

Backward compatibility

Usage

Uh oh!

NilaBlueshirt commented Apr 7, 2026

Uh oh!

alejandrogzi commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NilaBlueshirt commented Apr 1, 2026 •

edited

Loading