Scanpy Bone Marrow

Single-cell RNA-seq analysis of the human bone marrow samples using the Scanpy framework, following the scverse tutorials.

Dataset

Two bone marrow samples (s1d1, s1d3) from the scverse example data, hosted on Figshare.

Pipeline Overview

Quality Control — Filter cells/genes, flag mitochondrial/ribosomal/hemoglobin content
Doublet Detection — Scrublet-based identification of multiplets
Normalization — Library size normalization and log-transformation
Feature Selection — Identify highly variable genes (batch-aware)
Dimensionality Reduction — PCA, nearest-neighbor graph, UMAP
Clustering — Leiden community detection
Annotation — Marker gene–based cell type assignment

Setup

Requirements

Python 3.10+
GitHub Codespaces (4-core / 16 GB RAM recommended) or local environment

Installation

python3 -m venv .venv
source .venv/bin/activate
pip install 'scanpy[leiden]' pooch scikit-image ipykernel
python -m ipykernel install --user --name=venv --display-name "Python (.venv)"

Running

Open main.ipynb in VS Code or Jupyter, select the .venv kernel, and run all cells.

Key Dependencies

Package	Purpose
scanpy	Core scRNA-seq analysis
anndata	Data structure for annotated matrices
pooch	Tutorial data retrieval
scikit-image	Required by Scrublet (doublet detection)
leidenalg	Leiden clustering algorithm

Notes

Scrublet is memory-intensive. Use a 4-core / 16 GB machine if running on Codespaces.
The .venv/ directory is excluded via .gitignore.

References

Scanpy documentation
scverse tutorials
Luecken, M.D. et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. NeurIPS Datasets and Benchmarks (2021).
McCarthy, D.J. et al. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179–1186 (2017).
Wolock, S.L., Lopez, R. & Klein, A.M. Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. Cell Systems 8, 281–291 (2019).
Satija, R. et al. Spatial reconstruction of single-cell gene expression data. Nature Biotechnology 33, 495–502 (2015).
Stuart, T. et al. Comprehensive Integration of Single-Cell Data. Cell 177, 1888–1902 (2019).
Zheng, G.X.Y. et al. Massively parallel digital transcriptional profiling of single cells. Nature Communications 8, 14049 (2017).
McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv 1802.03426 (2018).
Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports 9, 5233 (2019).
Wolf, F.A., Angerer, P. & Theis, F.J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biology 19, 15 (2018).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
plots		plots
.gitignore		.gitignore
README.md		README.md
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scanpy Bone Marrow

Dataset

Pipeline Overview

Setup

Requirements

Installation

Running

Key Dependencies

Notes

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scanpy Bone Marrow

Dataset

Pipeline Overview

Setup

Requirements

Installation

Running

Key Dependencies

Notes

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages