Skip to content

nf-core refactor#109

Open
NilaBlueshirt wants to merge 12 commits intohillerlab:mainfrom
NilaBlueshirt:nf-core-refactor
Open

nf-core refactor#109
NilaBlueshirt wants to merge 12 commits intohillerlab:mainfrom
NilaBlueshirt:nf-core-refactor

Conversation

@NilaBlueshirt
Copy link
Copy Markdown
Contributor

@NilaBlueshirt NilaBlueshirt commented Apr 1, 2026

nf-core DSL2 refactor: Nextflow pipeline with containers and checkpoints, for Slurm HPC

Summary

Refactors make_lastz_chains from a Python-orchestrated pipeline into a standard nf-core-style DSL2 Nextflow pipeline. The original make_chains.py entry point is fully preserved.

What's new

Details can be found in Changelong.md, CHANGES_nfcore_refactor.md, and TODO.md

  • nf-core DSL2 pipeline
  • Single nextflow.config for all infrastructure settings, params.json for scientific parameters
  • SLURM job arrays
  • Checkpoint entry points
  • Dockerfile: full UCSC Kent distribution. The Docker image has been uploaded to Docker Hub and tested.
  • Support large input: added -long for farToTwoBit, replaces twobitreader to support 64-bit .2bit files for large genomes (fixes Alignment of a large genome in 2bit #56)
  • environment.yml — conda/mamba environment with all tools for the conda profile (we can add pip to it, but uv doesn't work well on HPC)

Key files

File Description
main.nf nf-core entry point with three workflow aliases
nextflow.config Unified infrastructure config
params.json Scientific parameter template — edit for each run
Dockerfile Container image definition
environment.yml Conda environment
modules/local/*/main.nf One process per pipeline step
subworkflows/local/*/main.nf Four subworkflows
workflows/make_lastz_chains.nf Main workflow
bin/ Helper scripts (partition, psl_bundle, split_chains)

Backward compatibility

make_chains.py and all original Python modules are unchanged. The pipeline can still be executed as v2.0.8.

Usage

Check the README.md file for 3 different use cases.

NilaBlueshirt and others added 2 commits May 29, 2025 09:15
Hello, 

I ran into the "error while loading shared libraries: libiconv.so.2: cannot open shared object file: No such file or directory" error when running chainCleaner manually. I was able to resolve this error by adding the "libiconv" package into my existing conda env. I found it useful to have this yml file on hand.
Merge remote-tracking branch 'upstream/main'
@NilaBlueshirt NilaBlueshirt changed the title Nf core refactor nf-core refactor Apr 1, 2026
Nil Mu added 3 commits April 1, 2026 17:36
- Rewrote pipeline from Python orchestration to native Nextflow DSL2 modules and subworkflows
- Replaced twobitreader with py2bit to support 64-bit .2bit files (fixes issue hillerlab#56)
- Added single Docker/Apptainer container with full UCSC Kent distribution
- Scientific parameters split into params.json; nextflow.config is infrastructure-only
- Added FROM_FILL_CHAINS and FROM_CLEAN_CHAINS checkpoint entry workflows
- Added params.json template and run_nf_slurm_example.sh for multi-pair SLURM runs
- Compute tiers: process_fast (16GB/0.5h), process_medium (50GB/2h)
- FA_TO_TWO_BIT raised to 50GB; CHAIN_CLEANER 80GB; REPEAT_FILLER 1h
- SLURM partition routing: <=4h to htc, >4h to public
- Container image controlled via NXF_CONTAINER_IMAGE env var
- LASTZ/AXT_CHAIN/REPEAT_FILLER submit as SLURM job arrays
- Added SLURM_SKIP_EPILOG, USR2@180 signal, and beforeScript sleep for LASTZ
- Replace !in with .contains() (invalid Groovy in NF 25.x)
- Move subworkflow alias from call site to include statement
- Fix DuplicateProcessInvocation: PARTITION -> PARTITION_TARGET / PARTITION_QUERY
- Fix all module labels (process_single/high -> process_fast/medium)
- Add run_lastz.py and run_lastz_intermediate_layer.py to bin/ for automatic Nextflow staging
- Set plain errorStrategy = retry globally
- README: three standalone sections, single-source info, Nextflow >= 25.04.6
- Add Changelog v3.0.0 and parameter audit tables to CHANGES_nfcore_refactor.md
Nil Mu and others added 3 commits April 2, 2026 10:03
…, writer side)

The reader fix (twobitreader → py2bit) was already in place, but faToTwoBit
was still writing version-0 (32-bit) .2bit files, causing an index overflow
abort on genomes with sequences larger than 4 GB. Adding -long writes version-1
(64-bit) files that py2bit can then read correctly.

Updated CHANGES_nfcore_refactor.md to document both halves of the issue hillerlab#56 fix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@alejandrogzi alejandrogzi self-assigned this Apr 3, 2026
Nil Mu and others added 4 commits April 5, 2026 18:13
…enomes (issue hillerlab#56, reader side)

lastz cannot read v1 (64-bit) .2bit files produced by faToTwoBit -long.
run_lastz.py now uses py2bit to extract each partition to a temp FASTA
in the task work directory before passing it to lastz. Temp files default
to CWD (Nextflow work dir) instead of /tmp, avoiding cross-node filesystem
issues on SLURM.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ccess

workflow.onComplete now checks if the final chain file actually exists
before logging its path. Prints a warning if the pipeline completed
successfully but produced no output (e.g. all LASTZ jobs yielded no
alignments).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ssue hillerlab#56)

py2bit does not actually support v1/64-bit .2bit files produced by
faToTwoBit -long, causing RuntimeError on large genomes (>4 GB).
Use twoBitToFa (UCSC CLI, already a pipeline dependency) to extract
partitions to temp FASTA instead — it supports both v0 and v1 files.
Removes py2bit and twobitreader Python dependencies entirely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- project_setup_procedures.py: replace TwoBitFile with twoBitInfo
  subprocess calls for chrom name listing, chrom sizes, and .2bit
  format detection
- Dockerfile: remove py2bit pip install; no Python .2bit library needed
- CHANGES_nfcore_refactor.md: document the full scope of the fix
- TODO.md: update done item to reflect actual approach

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@NilaBlueshirt
Copy link
Copy Markdown
Contributor Author

Good news, we have two test runs that went through successfully using this fork, we are verifying the results against the main branch. But I'm not sure what else to do besides comparing the final result files, what else would you like to check?

@alejandrogzi
Copy link
Copy Markdown
Member

Hey @NilaBlueshirt

I'd say the easiest would be comparing the final chains but if we want to be more specific i would compare all the outputs from the core alignment (lastz), then the repeat filling and the final output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Alignment of a large genome in 2bit

2 participants