Single-cell RNA-seq analysis of the human bone marrow samples using the Scanpy framework, following the scverse tutorials.
Two bone marrow samples (s1d1, s1d3) from the scverse example data, hosted on Figshare.
- Quality Control — Filter cells/genes, flag mitochondrial/ribosomal/hemoglobin content
- Doublet Detection — Scrublet-based identification of multiplets
- Normalization — Library size normalization and log-transformation
- Feature Selection — Identify highly variable genes (batch-aware)
- Dimensionality Reduction — PCA, nearest-neighbor graph, UMAP
- Clustering — Leiden community detection
- Annotation — Marker gene–based cell type assignment
- Python 3.10+
- GitHub Codespaces (4-core / 16 GB RAM recommended) or local environment
python3 -m venv .venv
source .venv/bin/activate
pip install 'scanpy[leiden]' pooch scikit-image ipykernel
python -m ipykernel install --user --name=venv --display-name "Python (.venv)"Open main.ipynb in VS Code or Jupyter, select the .venv kernel, and run all cells.
| Package | Purpose |
|---|---|
| scanpy | Core scRNA-seq analysis |
| anndata | Data structure for annotated matrices |
| pooch | Tutorial data retrieval |
| scikit-image | Required by Scrublet (doublet detection) |
| leidenalg | Leiden clustering algorithm |
- Scrublet is memory-intensive. Use a 4-core / 16 GB machine if running on Codespaces.
- The
.venv/directory is excluded via.gitignore.
- Scanpy documentation
- scverse tutorials
- Luecken, M.D. et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. NeurIPS Datasets and Benchmarks (2021).
- McCarthy, D.J. et al. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179–1186 (2017).
- Wolock, S.L., Lopez, R. & Klein, A.M. Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. Cell Systems 8, 281–291 (2019).
- Satija, R. et al. Spatial reconstruction of single-cell gene expression data. Nature Biotechnology 33, 495–502 (2015).
- Stuart, T. et al. Comprehensive Integration of Single-Cell Data. Cell 177, 1888–1902 (2019).
- Zheng, G.X.Y. et al. Massively parallel digital transcriptional profiling of single cells. Nature Communications 8, 14049 (2017).
- McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv 1802.03426 (2018).
- Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports 9, 5233 (2019).
- Wolf, F.A., Angerer, P. & Theis, F.J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biology 19, 15 (2018).