Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ The whole pipeline consists of five main steps, each of which can be run separat
- **Position Extraction** for targeted imputation sites.

4. **Imputation (`--impute`)**: This is the primary step, where genotypes in the target dataset are imputed using the prepared reference panel. The main steps are:
- **Imputation** of the target dataset using tools like [Glimpse1](https://odelaneau.github.io/GLIMPSE/glimpse1/index.html), [Glimpse2](https://odelaneau.github.io/GLIMPSE/), [Stitch](https://github.com/rwdavies/stitch), [Quilt](https://github.com/rwdavies/QUILT), [Beagle5](https://faculty.washington.edu/browning/beagle/beagle.html) or [Minimac4](https://github.com/statgen/Minimac4).
- **Imputation** of the target dataset using tools like [Glimpse1](https://odelaneau.github.io/GLIMPSE/glimpse1/index.html), [Glimpse2](https://odelaneau.github.io/GLIMPSE/), [Stitch](https://github.com/rwdavies/stitch), [Quilt/Quilt2](https://github.com/rwdavies/QUILT), [Beagle5](https://faculty.washington.edu/browning/beagle/beagle.html) or [Minimac4](https://github.com/statgen/Minimac4).
- **Ligation** of imputed chunks to produce a final VCF file per sample, with all chromosomes unified.

5. **Validation (`--validate`)**: Assesses imputation accuracy by comparing the imputed dataset to a truth dataset. This step leverages the [Glimpse2](https://odelaneau.github.io/GLIMPSE/) concordance process to summarize differences between two VCF files.
Expand Down
42 changes: 42 additions & 0 deletions conf/steps/imputation_quilt2.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

process {

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_QUILT2:.*' {
publishDir = [
path: { "${params.outdir}/imputation/quilt2/variant_calling/" },
enabled: params.publish_all,
mode: params.publish_dir_mode,
]
tag = {"Batch ${meta.batch} ${meta.regionout ?: meta.chr}"}
}

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_QUILT2:QUILT_QUILT2' {
ext.args = "--seed=${params.seed}"
ext.prefix = { "${meta.id}.batch${meta.batch}.${meta.regionout ? meta.regionout.replace(':','_') : meta.chr}.quilt2" }
}

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_QUILT2:GLIMPSE2_LIGATE' {
ext.prefix = { "${meta.id}.batch${meta.batch}.${meta.chr}.quilt2.ligate" }
}

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_QUILT2:BCFTOOLS_INDEX' {
ext.args = '--tbi'
}

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_QUILT2:.*' {
publishDir = [
path: { "${params.outdir}/imputation/quilt2/concat" },
mode: params.publish_dir_mode
]
}

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_QUILT2:BCFTOOLS_CONCAT' {
ext.args = "--output-type z --ligate"
ext.prefix = { "${meta.id}.batch${meta.batch}.quilt2" }
}
}
44 changes: 44 additions & 0 deletions conf/test_quilt2.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/phaseimpute -profile test_quilt2,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/

process {
resourceLimits = [
cpus: 4,
memory: '4.GB',
time: '1.h'
]

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_QUILT2:QUILT_QUILT2' {
cpus = 1
ext.args = {"--seed=${params.seed} --use_mspbwt=TRUE --impute_rare_common=FALSE" }
ext.prefix = { "${meta.id}.batch${meta.batch}.${meta.regionout ? meta.regionout.replace(':','_') : meta.chr}.quilt2" }
}
}

params {
config_profile_name = 'Minimal QUILT2 Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function using the tool QUILT2'

// Input data
input = "${projectDir}/tests/csv/sample_bam.csv"
input_region = "${projectDir}/tests/csv/region.csv"
chunks = "${projectDir}/tests/csv/chunks.csv"
panel = "${projectDir}/tests/csv/panel.csv"

Comment thread
atrigila marked this conversation as resolved.
// Genome references
fasta = params.pipelines_testdata_base_path + "hum_data/reference_genome/GRCh38.s.fa.gz"
map = "${projectDir}/tests/csv/map_glimpse.csv"

// Pipeline parameters
steps = "impute"
tools = "quilt2"
}
4 changes: 2 additions & 2 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ The results from `--steps impute` will have the following directory structure:
```tree
├── batch
├── csv
├── <glimpse1|glimpse2|quilt|stitch|beagle5|minimac4>
├── <glimpse1|glimpse2|quilt|quilt2|stitch|beagle5|minimac4>
│ ├── concat/
│ └── samples/
├── stats
Expand All @@ -152,7 +152,7 @@ The results from `--steps impute` will have the following directory structure:
- `imputation/batch/all.batchi.id.txt`: List of samples names processed in the i^th^ batch.
- `imputation/csv/`
- `impute.csv`: A single CSV file containing the path to a VCF file and its index, of each imputed sample with their corresponding tool.
- `imputation/[glimpse1,glimpse2,quilt,stitch]/`
- `imputation/[glimpse1,glimpse2,quilt,quilt2,stitch,beagle5,minimac4]/`
- `concat/all.batch*.vcf.gz`: The concatenated VCF files of all imputed samples by batches.
- `concat/all.batch*.vcf.gz.tbi`: The index file for the concatenated imputed VCF files of the samples.
- `samples/*.vcf.gz`: A VCF file of each imputed sample.
Expand Down
29 changes: 21 additions & 8 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,6 +375,7 @@ The different tests profiles are:
- `test`: A profile to evaluate the imputation step with the `glimpse1` tool.
- `test_glimpse2`: A profile to evaluate the imputation step with the `glimpse2` tool.
- `test_quilt`: A profile to evaluate the imputation step with the `quilt` tool.
- `test_quilt2`: A profile to evaluate the imputation step with the `quilt2` tool.
- `test_stitch`: A profile to evaluate the imputation step with the `stitch` tool.
- `test_beagle5`: A profile to evaluate the imputation step with the `beagle5` tool.
- `test_minimac4`: A profile to evaluate the imputation step with the `minimac4` tool.
Expand Down Expand Up @@ -472,9 +473,9 @@ For starting from the imputation steps, the required flags are:
- `--steps impute`
- `--input input.csv`: The samplesheet containing the input sample files in `bam`, `cram` or `vcf`, `bcf` format.
- `--genome` or `--fasta`: The reference genome of the samples.
- `--tools [glimpse1,glimpse2,quilt,stitch,beagle5,minimac4]`: A selection of one or more of the available imputation tools. Each imputation tool has their own set of specific flags and input files. These required files are produced by `--steps panelprep` and used as input in:
- `--tools [glimpse1,glimpse2,quilt,quilt2,stitch,beagle5,minimac4]`: A selection of one or more of the available imputation tools. Each imputation tool has their own set of specific flags and input files. These required files are produced by `--steps panelprep` and used as input in:
- `--posfile posfile.csv`: A samplesheet containing all the different files required by the imputation tool. This file can be generated with `--steps panelprep`.
- `--panel panel.csv`: A samplesheet containing the post-processed reference panel VCF (required by GLIMPSE1, GLIMPSE2). These files can be obtained with `--steps panelprep`.
- `--panel panel.csv`: A samplesheet containing the post-processed reference panel VCF (required by GLIMPSE1, GLIMPSE2 and QUILT2). These files can be obtained with `--steps panelprep`.

Optionnaly you can provide the following flags:

Expand All @@ -488,6 +489,7 @@ Optionnaly you can provide the following flags:
| `GLIMPSE1` | ✅ | ✅ ¹ | ✅ | ✅ | ✅ ³ | ✅ | ✅ |
| `GLIMPSE2` | ✅ | ✅ ¹ | ✅ | ✅ | ❌ | ✅ | ✅ |
| `QUILT` | ✅ | ✅ ² | ✅ | ❌ | ✅ ⁴ | ✅ | ✅ |
| `QUILT2` | ✅ | ✅ ² | ✅ | ✅ | ❌ | ✅ | ✅ |
| `STITCH` | ✅ | ✅ ² | ✅ | ❌ | ✅ ³ | ✅ | ✅ |
| `BEAGLE5` | ✅ | ✅ ¹ | ✅ | ✅ | ❌ | ✅ | ✅ |
| `MINIMAC4` | ✅ | ✅ ¹ | ✅ | ✅ | ✅ ⁵ | ✅ | ✅ |
Expand All @@ -496,8 +498,9 @@ Optionnaly you can provide the following flags:
> ² Alignment files only (i.e. BAM or CRAM)
> ³ `GLIMPSE1` and `STITCH`: Should be a CSV with columns [panel id, chr, posfile]
> ⁴ `QUILT`: Should be a CSV with columns [panel id, chr, hap, legend, posfile]
> ⁵ `MINIMAC4`: Optionally, a VCF with its index can be provided for more control over the imputed positions. Should be a CSV with columns [panel id, chr, vcf, index]
> ⁶ Not yet supported
> ⁵ `QUILT2`: Uses the reference panel VCF directly. The panel CSV should contain [panel id, chr, vcf, index]
> ⁶ `MINIMAC4`: Optionally, a VCF with its index can be provided for more control over the imputed positions. Should be a CSV with columns [panel id, chr, vcf, index]
> ⁷ Not yet supported

Here is a representation on how the input files will be processed depending on the input files type and the selected imputation tool.

Expand All @@ -523,15 +526,25 @@ To summarize:
- GLIMPSE2 should not do target-to-target imputation.
- If you have alignment files (e.g., BAM or CRAM), all tools are available, and processing will occur in `batch_size`:
- GLIMPSE1 and STITCH may induce batch effects, so all samples need to be imputed together.
- GLIMPSE2 and QUILT can process samples in separate batches.
- GLIMPSE2, QUILT and QUILT2 can process samples in separate batches.

## Imputation tools `--steps impute --tools [glimpse1,glimpse2,quilt,stitch,beagle5,minimac4]`
## Imputation tools `--steps impute --tools [glimpse1,glimpse2,quilt,quilt2,stitch,beagle5,minimac4]`

You can choose different software to perform the imputation. In the following sections, the typical commands for running the pipeline with each software are included. Multiple tools can be selected by separating them with a comma (eg. `--tools glimpse1,quilt`).

### QUILT
### QUILT / QUILT2

[QUILT](https://github.com/rwdavies/QUILT) is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel. The required inputs for this program are bam samples provided in the input samplesheet (`--input`) and a CSV file with the genomic chunks (`--chunks`).
[QUILT](https://github.com/rwdavies/QUILT) is an R and C++ package for read-aware genotype imputation from low-coverage sequencing using a reference panel. This pipeline contains the original QUILT method (`QUILT.R`, referred to here as `quilt`) and the newer QUILT2 method (`QUILT2.R`, exposed in this pipeline as `quilt2`).

In `nf-core/phaseimpute`, both methods use alignment files from `--input`, optionally benefit from `--map`, and can use `--chunks` to split the genome into overlapping imputation regions.

Choose `quilt2` by default for new projects. The official QUILT2 documentation describes it as the recommended modern method for large reference panels and diverse sequencing inputs including short reads, long reads, linked/barcoded reads and ancient DNA. The QUILT2 paper also reports a dedicated cfDNA/NIPT mode upstream; however, the current `nf-core/phaseimpute` integration includes the diploid imputation workflow only.

Choose `quilt` when you specifically want the original QUILT workflow.

#### `quilt`

The required inputs for `quilt` are BAM/CRAM samples provided in the input samplesheet (`--input`) and a CSV file with the genomic chunks (`--chunks`).
Comment on lines +545 to +547
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a similar part for quilt2 ?


```bash
nextflow run nf-core/phaseimpute \
Expand Down
11 changes: 9 additions & 2 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@
"vcf_impute_beagle5",
"vcf_impute_glimpse",
"vcf_impute_minimac4",
"vcf_phase_shapeit5"
"vcf_phase_shapeit5",
"bam_impute_quilt2"
]
},
"bcftools/merge": {
Expand Down Expand Up @@ -123,7 +124,8 @@
"bam_vcf_impute_glimpse2",
"modules",
"vcf_impute_beagle5",
"vcf_impute_minimac4"
"vcf_impute_minimac4",
"bam_impute_quilt2"
]
},
"glimpse2/phase": {
Expand Down Expand Up @@ -161,6 +163,11 @@
"git_sha": "4e2990cc0df18823d11b192df73039c80fdebc7c",
"installed_by": ["bam_impute_quilt", "modules"]
},
"quilt/quilt2": {
"branch": "master",
"git_sha": "4e94abb54fc2e2e868e943ee137a958878b8d092",
"installed_by": ["bam_impute_quilt2", "modules"]
},
"samtools/coverage": {
"branch": "master",
"git_sha": "4e3e10e502ec6ab6b1c4b4fecd923ff1fa287338",
Expand Down
8 changes: 8 additions & 0 deletions modules/nf-core/quilt/quilt2/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

105 changes: 105 additions & 0 deletions modules/nf-core/quilt/quilt2/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading