Skip to content

oliviahelens/Scanpy_Bone_Marrow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Scanpy Bone Marrow

Single-cell RNA-seq analysis of the human bone marrow samples using the Scanpy framework, following the scverse tutorials.

Dataset

Two bone marrow samples (s1d1, s1d3) from the scverse example data, hosted on Figshare.

Pipeline Overview

  1. Quality Control — Filter cells/genes, flag mitochondrial/ribosomal/hemoglobin content
  2. Doublet Detection — Scrublet-based identification of multiplets
  3. Normalization — Library size normalization and log-transformation
  4. Feature Selection — Identify highly variable genes (batch-aware)
  5. Dimensionality Reduction — PCA, nearest-neighbor graph, UMAP
  6. Clustering — Leiden community detection
  7. Annotation — Marker gene–based cell type assignment

Setup

Requirements

  • Python 3.10+
  • GitHub Codespaces (4-core / 16 GB RAM recommended) or local environment

Installation

python3 -m venv .venv
source .venv/bin/activate
pip install 'scanpy[leiden]' pooch scikit-image ipykernel
python -m ipykernel install --user --name=venv --display-name "Python (.venv)"

Running

Open main.ipynb in VS Code or Jupyter, select the .venv kernel, and run all cells.

Key Dependencies

Package Purpose
scanpy Core scRNA-seq analysis
anndata Data structure for annotated matrices
pooch Tutorial data retrieval
scikit-image Required by Scrublet (doublet detection)
leidenalg Leiden clustering algorithm

Notes

  • Scrublet is memory-intensive. Use a 4-core / 16 GB machine if running on Codespaces.
  • The .venv/ directory is excluded via .gitignore.

References

  • Scanpy documentation
  • scverse tutorials
  • Luecken, M.D. et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. NeurIPS Datasets and Benchmarks (2021).
  • McCarthy, D.J. et al. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179–1186 (2017).
  • Wolock, S.L., Lopez, R. & Klein, A.M. Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. Cell Systems 8, 281–291 (2019).
  • Satija, R. et al. Spatial reconstruction of single-cell gene expression data. Nature Biotechnology 33, 495–502 (2015).
  • Stuart, T. et al. Comprehensive Integration of Single-Cell Data. Cell 177, 1888–1902 (2019).
  • Zheng, G.X.Y. et al. Massively parallel digital transcriptional profiling of single cells. Nature Communications 8, 14049 (2017).
  • McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv 1802.03426 (2018).
  • Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports 9, 5233 (2019).
  • Wolf, F.A., Angerer, P. & Theis, F.J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biology 19, 15 (2018).

About

Single-cell RNA-seq analysis of the human bone marrow samples using the Scanpy framework, following the scverse tutorials.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors