Best practices for chunking/phasing rare variants with SHAPEIT5 without predefined chunks?

Dear all,
Since my study organism lacks predefined chunks (as available for human genomes), I am considering implementing a sliding window approach to phase my WGS data. I am working with WGS data from 3,200 samples of a non-model organism (genome size ~3 Gb). Given SHAPEIT5’s reported higher accuracy compared to Beagle, I aim to use it for phasing in our project. Following the guidelines from the tutorial (https://odelaneau.github.io/shapeit5/docs/tutorials/ukb_wgs/#quality-control), I performed preliminary quality control on the SNPs, removing those with missingness >10% and those in low-mappability regions.

Common Variants Phasing: 

`phase_common_static --progress -T 32 -I ${chr}.flt.vcf.gz --filter-maf 0.01 -R ${chr} -O ${chr}.common_imp.bcf`

Rare Variants Phasing: For rare variants (MAF < 0.01), I attempted to phase them in relatively small chunks (5 Mb) using:

`phase_rare_static --thread 88 --progress --input 18.fill.vcf.gz --input-region 18:1-5000000 --scaffold 18.common_imp.bcf --output 18.rare_imp.bcf --scaffold-region 18:1-55982971`

However, when running `phase_rare_static` on the smallest chromosome, the memory usage exceeded 1 TB. Given our computational constraints, this is infeasible for larger chromosomes.

To mitigate the memory issue, I extended the --scaffold-region by 2.5 Mb on both sides of the --input-region. This approach allowed the job to complete successfully, though peak memory usage remained high (~524.66 GB). For example: 

For the first and second 5 Mb chunks

`phase_rare_static --thread 88 --progress --input 18.fill.vcf.gz --input-region 18:1-5000000 --scaffold 18.common_imp.bcf --output 18.rare_imp_01.bcf --scaffold-region 18:1-10000000`
`phase_rare_static --thread 88 --progress --input 18.fill.vcf.gz --input-region 18:5000001-10000000 --scaffold 18.common_imp.bcf --output 18.rare_imp_02.bcf --scaffold-region 18:2500000-12500000`

Would this be an appropriate strategy for phasing rare variants across the genome?

Best regards,

Zheng zhuqing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices for chunking/phasing rare variants with SHAPEIT5 without predefined chunks? #129

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Best practices for chunking/phasing rare variants with SHAPEIT5 without predefined chunks? #129

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions