Skip to content

man4ish/deepvariant-finetuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

DeepVariant Fine-Tuning Showcase

This repository demonstrates DeepVariant, a deep learning-based variant caller, and showcases fine-tuning on a small genomic dataset. The goal is to show how a pretrained model can be adapted to custom datasets and compare results before and after fine-tuning.


Overview

DeepVariant uses a convolutional neural network (CNN) to call genetic variants from aligned sequencing data.

  • Inputs: BAM/CRAM reads aligned to a reference genome
  • Outputs: VCF/gVCF files containing variant calls
  • Fine-tuning: Adapts the pretrained model to new datasets for improved accuracy

This repo provides scripts, notebooks, and example data for a hands-on demo.


Repository Structure


DeepVariant-Finetuning/
├── data/                  # Example BAM and reference genome
├── notebooks/             # Jupyter/Colab demo notebook
├── scripts/
│   ├── run_inference.sh   # Run default DeepVariant
│   ├── make_examples.sh   # Prepare TFRecords for fine-tuning
│   ├── fine_tune.sh       # Fine-tune the model
│   └── run_finetuned.sh   # Run inference with fine-tuned model
├── models/
│   └── custom_model/      # Output of fine-tuning
├── README.md
└── requirements.txt       # Optional dependencies


Getting Started

Option 1: Run on Google Colab (Recommended for Mac/ARM)

  1. Open the notebook notebooks/demo_deepvariant.ipynb in Colab.
  2. Upload example BAM and reference genome (small chromosome preferred for demo).
  3. Run the cells step-by-step:
    • Run inference with the pretrained model.
    • Generate examples for fine-tuning.
    • Fine-tune on your dataset.
    • Run inference with fine-tuned model.
    • Compare results (variant counts, VCF differences).

Option 2: Run Locally with Docker (x86_64 recommended)

docker pull google/deepvariant:1.5.0
docker run --platform linux/amd64 -v $PWD:/input -t google/deepvariant:1.5.0 \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WGS \
--ref=/input/data/REFERENCE.fa \
--reads=/input/data/EXAMPLE.bam \
--output_vcf=/input/data/output.vcf \
--num_shards=4

⚠ On Apple M1/M4/M5 Macs, emulation may be slow and fine-tuning is not recommended locally.


Running Fine-Tuning

  1. Generate training examples:
sh scripts/make_examples.sh
  1. Fine-tune the pretrained model:
sh scripts/fine_tune.sh
  1. Run inference with the fine-tuned model:
sh scripts/run_finetuned.sh
  1. Compare the outputs:
grep -vc "^#" data/output.vcf
grep -vc "^#" data/output_finetuned.vcf

Results

  • Default model vs fine-tuned model
  • Variant counts, example VCF snippets, and pileup visualizations
  • Demonstrates improvements from domain adaptation

References


Notes

  • For full fine-tuning on human genomes, GPU + x86_64 is recommended.
  • Small datasets are used here for demo purposes only.
  • Colab demo allows running end-to-end without local Docker setup.

About

Showcase of DeepVariant variant calling with fine-tuning on small genomic datasets. Includes example scripts, notebook, and comparison of pretrained vs fine-tuned models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages