-
Notifications
You must be signed in to change notification settings - Fork 21
Add quilt2 first draft implementation #282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from all commits
581fccb
e485e63
7b1ba0b
e528a0b
d243866
a99218a
9342744
6fd6c69
bb5fcb1
73b3510
b00ac83
68f396f
9100414
7e377da
ca975bc
87b31b3
183886e
1883862
0263ad2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| /* | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| Config file for defining DSL2 per module options and publishing paths | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| */ | ||
|
|
||
| process { | ||
|
|
||
| withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_QUILT2:.*' { | ||
| publishDir = [ | ||
| path: { "${params.outdir}/imputation/quilt2/variant_calling/" }, | ||
| enabled: params.publish_all, | ||
| mode: params.publish_dir_mode, | ||
| ] | ||
| tag = {"Batch ${meta.batch} ${meta.regionout ?: meta.chr}"} | ||
| } | ||
|
|
||
| withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_QUILT2:QUILT_QUILT2' { | ||
| ext.args = "--seed=${params.seed}" | ||
| ext.prefix = { "${meta.id}.batch${meta.batch}.${meta.regionout ? meta.regionout.replace(':','_') : meta.chr}.quilt2" } | ||
| } | ||
|
|
||
| withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_QUILT2:GLIMPSE2_LIGATE' { | ||
| ext.prefix = { "${meta.id}.batch${meta.batch}.${meta.chr}.quilt2.ligate" } | ||
| } | ||
|
|
||
| withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_QUILT2:BCFTOOLS_INDEX' { | ||
| ext.args = '--tbi' | ||
| } | ||
|
|
||
| withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_QUILT2:.*' { | ||
| publishDir = [ | ||
| path: { "${params.outdir}/imputation/quilt2/concat" }, | ||
| mode: params.publish_dir_mode | ||
| ] | ||
| } | ||
|
|
||
| withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_QUILT2:BCFTOOLS_CONCAT' { | ||
| ext.args = "--output-type z --ligate" | ||
| ext.prefix = { "${meta.id}.batch${meta.batch}.quilt2" } | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| /* | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| Nextflow config file for running minimal tests | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| Defines input files and everything required to run a fast and simple pipeline test. | ||
|
|
||
| Use as follows: | ||
| nextflow run nf-core/phaseimpute -profile test_quilt2,<docker/singularity> --outdir <OUTDIR> | ||
|
|
||
| ---------------------------------------------------------------------------------------- | ||
| */ | ||
|
|
||
| process { | ||
| resourceLimits = [ | ||
| cpus: 4, | ||
| memory: '4.GB', | ||
| time: '1.h' | ||
| ] | ||
|
|
||
| withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_QUILT2:QUILT_QUILT2' { | ||
| cpus = 1 | ||
| ext.args = {"--seed=${params.seed} --use_mspbwt=TRUE --impute_rare_common=FALSE" } | ||
| ext.prefix = { "${meta.id}.batch${meta.batch}.${meta.regionout ? meta.regionout.replace(':','_') : meta.chr}.quilt2" } | ||
| } | ||
| } | ||
|
|
||
| params { | ||
| config_profile_name = 'Minimal QUILT2 Test profile' | ||
| config_profile_description = 'Minimal test dataset to check pipeline function using the tool QUILT2' | ||
|
|
||
| // Input data | ||
| input = "${projectDir}/tests/csv/sample_bam.csv" | ||
| input_region = "${projectDir}/tests/csv/region.csv" | ||
| chunks = "${projectDir}/tests/csv/chunks.csv" | ||
| panel = "${projectDir}/tests/csv/panel.csv" | ||
|
|
||
| // Genome references | ||
| fasta = params.pipelines_testdata_base_path + "hum_data/reference_genome/GRCh38.s.fa.gz" | ||
| map = "${projectDir}/tests/csv/map_glimpse.csv" | ||
|
|
||
| // Pipeline parameters | ||
| steps = "impute" | ||
| tools = "quilt2" | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -375,6 +375,7 @@ The different tests profiles are: | |
| - `test`: A profile to evaluate the imputation step with the `glimpse1` tool. | ||
| - `test_glimpse2`: A profile to evaluate the imputation step with the `glimpse2` tool. | ||
| - `test_quilt`: A profile to evaluate the imputation step with the `quilt` tool. | ||
| - `test_quilt2`: A profile to evaluate the imputation step with the `quilt2` tool. | ||
| - `test_stitch`: A profile to evaluate the imputation step with the `stitch` tool. | ||
| - `test_beagle5`: A profile to evaluate the imputation step with the `beagle5` tool. | ||
| - `test_minimac4`: A profile to evaluate the imputation step with the `minimac4` tool. | ||
|
|
@@ -472,9 +473,9 @@ For starting from the imputation steps, the required flags are: | |
| - `--steps impute` | ||
| - `--input input.csv`: The samplesheet containing the input sample files in `bam`, `cram` or `vcf`, `bcf` format. | ||
| - `--genome` or `--fasta`: The reference genome of the samples. | ||
| - `--tools [glimpse1,glimpse2,quilt,stitch,beagle5,minimac4]`: A selection of one or more of the available imputation tools. Each imputation tool has their own set of specific flags and input files. These required files are produced by `--steps panelprep` and used as input in: | ||
| - `--tools [glimpse1,glimpse2,quilt,quilt2,stitch,beagle5,minimac4]`: A selection of one or more of the available imputation tools. Each imputation tool has their own set of specific flags and input files. These required files are produced by `--steps panelprep` and used as input in: | ||
| - `--posfile posfile.csv`: A samplesheet containing all the different files required by the imputation tool. This file can be generated with `--steps panelprep`. | ||
| - `--panel panel.csv`: A samplesheet containing the post-processed reference panel VCF (required by GLIMPSE1, GLIMPSE2). These files can be obtained with `--steps panelprep`. | ||
| - `--panel panel.csv`: A samplesheet containing the post-processed reference panel VCF (required by GLIMPSE1, GLIMPSE2 and QUILT2). These files can be obtained with `--steps panelprep`. | ||
|
|
||
| Optionnaly you can provide the following flags: | ||
|
|
||
|
|
@@ -488,6 +489,7 @@ Optionnaly you can provide the following flags: | |
| | `GLIMPSE1` | ✅ | ✅ ¹ | ✅ | ✅ | ✅ ³ | ✅ | ✅ | | ||
| | `GLIMPSE2` | ✅ | ✅ ¹ | ✅ | ✅ | ❌ | ✅ | ✅ | | ||
| | `QUILT` | ✅ | ✅ ² | ✅ | ❌ | ✅ ⁴ | ✅ | ✅ | | ||
| | `QUILT2` | ✅ | ✅ ² | ✅ | ✅ | ❌ | ✅ | ✅ | | ||
| | `STITCH` | ✅ | ✅ ² | ✅ | ❌ | ✅ ³ | ✅ | ✅ | | ||
| | `BEAGLE5` | ✅ | ✅ ¹ | ✅ | ✅ | ❌ | ✅ | ✅ | | ||
| | `MINIMAC4` | ✅ | ✅ ¹ | ✅ | ✅ | ✅ ⁵ | ✅ | ✅ | | ||
|
|
@@ -496,8 +498,9 @@ Optionnaly you can provide the following flags: | |
| > ² Alignment files only (i.e. BAM or CRAM) | ||
| > ³ `GLIMPSE1` and `STITCH`: Should be a CSV with columns [panel id, chr, posfile] | ||
| > ⁴ `QUILT`: Should be a CSV with columns [panel id, chr, hap, legend, posfile] | ||
| > ⁵ `MINIMAC4`: Optionally, a VCF with its index can be provided for more control over the imputed positions. Should be a CSV with columns [panel id, chr, vcf, index] | ||
| > ⁶ Not yet supported | ||
| > ⁵ `QUILT2`: Uses the reference panel VCF directly. The panel CSV should contain [panel id, chr, vcf, index] | ||
| > ⁶ `MINIMAC4`: Optionally, a VCF with its index can be provided for more control over the imputed positions. Should be a CSV with columns [panel id, chr, vcf, index] | ||
| > ⁷ Not yet supported | ||
|
|
||
| Here is a representation on how the input files will be processed depending on the input files type and the selected imputation tool. | ||
|
|
||
|
|
@@ -523,15 +526,25 @@ To summarize: | |
| - GLIMPSE2 should not do target-to-target imputation. | ||
| - If you have alignment files (e.g., BAM or CRAM), all tools are available, and processing will occur in `batch_size`: | ||
| - GLIMPSE1 and STITCH may induce batch effects, so all samples need to be imputed together. | ||
| - GLIMPSE2 and QUILT can process samples in separate batches. | ||
| - GLIMPSE2, QUILT and QUILT2 can process samples in separate batches. | ||
|
|
||
| ## Imputation tools `--steps impute --tools [glimpse1,glimpse2,quilt,stitch,beagle5,minimac4]` | ||
| ## Imputation tools `--steps impute --tools [glimpse1,glimpse2,quilt,quilt2,stitch,beagle5,minimac4]` | ||
|
|
||
| You can choose different software to perform the imputation. In the following sections, the typical commands for running the pipeline with each software are included. Multiple tools can be selected by separating them with a comma (eg. `--tools glimpse1,quilt`). | ||
|
|
||
| ### QUILT | ||
| ### QUILT / QUILT2 | ||
|
|
||
| [QUILT](https://github.com/rwdavies/QUILT) is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel. The required inputs for this program are bam samples provided in the input samplesheet (`--input`) and a CSV file with the genomic chunks (`--chunks`). | ||
| [QUILT](https://github.com/rwdavies/QUILT) is an R and C++ package for read-aware genotype imputation from low-coverage sequencing using a reference panel. This pipeline contains the original QUILT method (`QUILT.R`, referred to here as `quilt`) and the newer QUILT2 method (`QUILT2.R`, exposed in this pipeline as `quilt2`). | ||
|
|
||
| In `nf-core/phaseimpute`, both methods use alignment files from `--input`, optionally benefit from `--map`, and can use `--chunks` to split the genome into overlapping imputation regions. | ||
|
|
||
| Choose `quilt2` by default for new projects. The official QUILT2 documentation describes it as the recommended modern method for large reference panels and diverse sequencing inputs including short reads, long reads, linked/barcoded reads and ancient DNA. The QUILT2 paper also reports a dedicated cfDNA/NIPT mode upstream; however, the current `nf-core/phaseimpute` integration includes the diploid imputation workflow only. | ||
|
|
||
| Choose `quilt` when you specifically want the original QUILT workflow. | ||
|
|
||
| #### `quilt` | ||
|
|
||
| The required inputs for `quilt` are BAM/CRAM samples provided in the input samplesheet (`--input`) and a CSV file with the genomic chunks (`--chunks`). | ||
|
Comment on lines
+545
to
+547
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we add a similar part for |
||
|
|
||
| ```bash | ||
| nextflow run nf-core/phaseimpute \ | ||
|
|
||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Uh oh!
There was an error while loading. Please reload this page.