DeepVariant Fine-Tuning Showcase

This repository demonstrates DeepVariant, a deep learning-based variant caller, and showcases fine-tuning on a small genomic dataset. The goal is to show how a pretrained model can be adapted to custom datasets and compare results before and after fine-tuning.

Overview

DeepVariant uses a convolutional neural network (CNN) to call genetic variants from aligned sequencing data.

Inputs: BAM/CRAM reads aligned to a reference genome
Outputs: VCF/gVCF files containing variant calls
Fine-tuning: Adapts the pretrained model to new datasets for improved accuracy

This repo provides scripts, notebooks, and example data for a hands-on demo.

Repository Structure


DeepVariant-Finetuning/
├── data/                  # Example BAM and reference genome
├── notebooks/             # Jupyter/Colab demo notebook
├── scripts/
│   ├── run_inference.sh   # Run default DeepVariant
│   ├── make_examples.sh   # Prepare TFRecords for fine-tuning
│   ├── fine_tune.sh       # Fine-tune the model
│   └── run_finetuned.sh   # Run inference with fine-tuned model
├── models/
│   └── custom_model/      # Output of fine-tuning
├── README.md
└── requirements.txt       # Optional dependencies

Getting Started

Option 1: Run on Google Colab (Recommended for Mac/ARM)

Open the notebook notebooks/demo_deepvariant.ipynb in Colab.
Upload example BAM and reference genome (small chromosome preferred for demo).
Run the cells step-by-step:
- Run inference with the pretrained model.
- Generate examples for fine-tuning.
- Fine-tune on your dataset.
- Run inference with fine-tuned model.
- Compare results (variant counts, VCF differences).

Option 2: Run Locally with Docker (x86_64 recommended)

docker pull google/deepvariant:1.5.0
docker run --platform linux/amd64 -v $PWD:/input -t google/deepvariant:1.5.0 \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WGS \
--ref=/input/data/REFERENCE.fa \
--reads=/input/data/EXAMPLE.bam \
--output_vcf=/input/data/output.vcf \
--num_shards=4

⚠ On Apple M1/M4/M5 Macs, emulation may be slow and fine-tuning is not recommended locally.

Running Fine-Tuning

Generate training examples:

sh scripts/make_examples.sh

Fine-tune the pretrained model:

sh scripts/fine_tune.sh

Run inference with the fine-tuned model:

sh scripts/run_finetuned.sh

Compare the outputs:

grep -vc "^#" data/output.vcf
grep -vc "^#" data/output_finetuned.vcf

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
notebooks		notebooks
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepVariant Fine-Tuning Showcase

Overview

Repository Structure

Getting Started

Option 1: Run on Google Colab (Recommended for Mac/ARM)

Option 2: Run Locally with Docker (x86_64 recommended)

Running Fine-Tuning

Results

References

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeepVariant Fine-Tuning Showcase

Overview

Repository Structure

Getting Started

Option 1: Run on Google Colab (Recommended for Mac/ARM)

Option 2: Run Locally with Docker (x86_64 recommended)

Running Fine-Tuning

Results

References

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages