Skip to content

xuefei-wang/deepcell-types

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepCell Types

DeepCell Types is a generalized cell-phenotyping model for spatial proteomics. It addresses generalization across datasets with varying marker panels by reading the active marker / cell-type registry from a TissueNet zarr archive at inference time.

License notice. This project is distributed under a Modified Apache License, Version 2.0 with non-commercial / academic-only carve-outs (see the LICENSE file for the full text). For any other use, including commercial use, contact vanvalenlab@gmail.com.

Installation

As with all Python packages, users are encouraged to use some form of virtual environment for package installation. Popular options include venv/virtualenv, conda/mamba, uv, or pixi. Users are encouraged to use whatever environment management toolchain they are most comfortable with. For those unsure, the quickest way to start is to use the venv module, part of the Python standard library:

# Create a new virtual environment
python -m venv dct-env
# Enter the virtual environment
source dct-env/bin/activate

# Once inside the environment, install deepcell-types
pip install git+https://github.com/vanvalenlab/deepcell-types@master

Download the model

from deepcell_types.utils import download_model
download_model()

TissueNet zarr archive

Canonical checkpoints do not embed the marker / cell-type registry — they read it from a TissueNet zarr v3 archive at inference time. You must provide one before calling predict, either by passing zarr_path=... directly or by setting the DEEPCELL_TYPES_ZARR_PATH environment variable.

A registered user can download a public TissueNet zarr archive from https://users.deepcell.org; see docs/site/API-key.md for the access token flow. Place the resulting .zarr directory anywhere, then:

export DEEPCELL_TYPES_ZARR_PATH=/absolute/path/to/tissuenet.zarr

Running

The deepcell-types cell-type inference functionality is provided via a simple functional interface, deepcell_types.predict.

For a complete example of the cell-type inference pipeline, check out the tutorial.

Training

To retrain or fine-tune, install the [train] extra (pulls in wandb, zarr, pandas, scikit-learn, torchmetrics, plotly, etc.):

pip install "deepcell-types[train] @ git+https://github.com/vanvalenlab/deepcell-types@master"

Training entry points live under scripts/:

  • scripts/train.py — main training loop.
  • scripts/pretrain.py — masked-marker pretraining.
  • scripts/predict.py — batched evaluation over a zarr archive.
  • scripts/benchmark_gold_standard.py — gold-standard benchmark suite.

All training scripts read configuration from a TissueNet zarr v3 archive. Pass --zarr_path or set DEEPCELL_TYPES_ZARR_PATH. The training-side modules under deepcell_types.training (e.g. TissueNetConfig, FullImageDataset, FocalLoss, HierarchicalLoss) are stable enough to import directly for custom training scripts.

Baselines

Comparison baselines from the paper live as git submodules under baselines/:

Each submodule has its own README and pyproject.toml. To populate them after cloning, run git submodule update --init --recursive.

Citation

@article{deepcelltypes,
  title={Generalized cell phenotyping for spatial proteomics with language-informed vision models},
  author={Wang, Xuefei and Dilip, Rohit and Bussi, Yuval and Brown, Caitlin and Pradhan, Elora and Jain, Yashvardhan and Yu, Kevin and Li, Shenyi and Abt, Martin and Borner, Katy and others},
  journal={bioRxiv},
  pages={2024--11},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.8%
  • Shell 0.2%