pbmc.counts <- Read10X(data.dir = "~/Downloads/pbmc3k/filtered_gene_bc_matrices/hg19/")
pbmc <- CreateSeuratObject(counts = pbmc.counts)
pbmc <- NormalizeData(object = pbmc)
pbmc <- FindVariableFeatures(object = pbmc)
pbmc <- ScaleData(object = pbmc)
pbmc <- RunPCA(object = pbmc)
pbmc <- FindNeighbors(object = pbmc)
pbmc <- FindClusters(object = pbmc)
pbmc <- RunTSNE(object = pbmc)
DimPlot(object = pbmc, reduction = "tsne")
| Seurat v2.X | Seurat v3.X |
|---|---|
object@data |
GetAssayData(object = object) |
object@raw.data |
GetAssayData(object = object, slot = "counts") |
object@scale.data |
GetAssayData(object = object, slot = "scale.data") |
object@cell.names |
colnames(x = object) |
rownames(x = object@data) |
rownames(x = object) |
object@var.genes |
VariableFeatures(object = object) |
object@hvg.info |
HVFInfo(object = object) |
object@assays$assay.name |
object[["assay.name"]] |
object@dr$pca |
object[["pca"]] |
GetCellEmbeddings(object = object, reduction.type = "pca") |
Embeddings(object = object, reduction = "pca") |
GetGeneLoadings(object = object, reduction.type = "pca") |
Loadings(object = object, reduction = "pca") |
AddMetaData(object = object, metadata = vector, col.name = "name") |
object$name <- vector |
object@meta.data$name |
object$name |
object@idents |
Idents(object = object) |
SetIdent(object = object, ident.use = "new.idents") |
Idents(object = object) <- "new.idents") |
SetIdent(object = object, cells.use = 1:10, ident.use = "new.idents") |
Idents(object = object, cells = 1:10) <- "new.idents") |
StashIdent(object = object, save.name = "saved.idents") |
object$saved.idents <- Idents(object = object) |
levels(x = object@idents) |
levels(x = objects) |
RenameIdent(object = object, old.ident.name = "old.ident", new.ident.name = "new.ident") |
RenameIdents(object = object, "old.ident" = "new.ident") |
WhichCells(object = object, ident = "ident.keep") |
WhichCells(object = object, idents = "ident.keep") |
WhichCells(object = object, ident.remove = "ident.remove") |
WhichCells(object = object, idents = "ident.remove", invert = TRUE) |
WhichCells(object = object, max.cells.per.ident = 500) |
WhichCells(object = object, downsample = 500) |
WhichCells(object = object, subset.name = "name", low.threshold = low, high.threshold = high) |
WhichCells(object = object, expression = name > low & name < high) |
FilterCells(object = object, subset.names = "name", low.threshold = low, high.threshold = high) |
subset(x = object, subset = name > low & name < high) |
SubsetData(object = object, subset.name = "name", low.threshold = low, high.threshold = high) |
subset(x = object, subset = name > low & name < high) |
MergeSeurat(object1 = object1, object2 = object2) |
merge(x = object1, y = object2) |
Seurat has 3 data slots (source):
-
counts(raw.datain v2)- The raw data slot (object@raw.data) represents the original expression matrix, input when creating the Seurat object, and prior to any preprocessing by Seurat. For example, this could represent the UMI matrix generated by DropSeqTools or 10X CellRanger, a count matrix from featureCounts, an FPKM matrix produced by Cufflinks, or a TPM matrix produced by RSEM. Row names represent gene names, and column names represent cell names. Either raw counts or normalized values (i.e. FPKM or TPM) are fine, but the input expression matrix should not be log-transformed. Please note that Seurat can be used to analyze single cell data produced by any technology, as long as you can create an expression matrix. We provide the Read10X function to provide easy importing for datasets produced by the 10X Chromium system. Seurat uses count data when performing gene scaling and differential expression tests based on the negative binomial distribution.
-
data= log-normalized data- The
dataslot stores normalized and log-transformed single cell expression. This maintains the relative abundance levels of all genes, and contains only zeros or positive values. See ?NormalizeData for more information. This data is used for visualizations, such as violin and feature plots, most differential expression tests, finding high-variance genes, and as input to ScaleData (see below).
- The
-
scale.data(= z-score normalized data)- The
scale.dataslot represents a cell’s relative expression of each gene, in comparison to all other cells. Therefore this matrix contains both positive and negative values. See ?ScaleData for more information If regressing genes against unwanted sources of variation (for example, to remove cell-cycle effects), the scaled residuals from the model are stored here. This data is used as input for dimensional reduction techniques, and is displayed in heatmaps.
- The
> GetAssayData(as_fet_comb, "counts") %>% dim
[1] 0 0
> GetAssayData(as_fet_comb, "scale.data") %>% dim
[1] 1 1
> GetAssayData(as_fet_comb, "data") %>% dim
[1] 1000 1491
- stored in
object@raw.data(Seurat2) - can be accessed so:
raw.data <- GetAssayData(object = object,
assay.type = assay.type,
slot = "raw.data")
- stored in
object@data - can be added so:
object <- SetAssayData(object = object,
assay.type = assay.type,
slot = "data",
new.data = normalized.data)
If there are multiple assays stored within the same Seurat object, one will manually have to select the "active" one:
> srt
An object of class Seurat
50120 features across 26335 samples within 3 assays
Active assay: SCT (20844 features)
2 other assays present: RNA, integrated
2 dimensional reductions calculated: pca, umap
> srt@active.assay # find out which one's active
> DefaultAssay(srt) <- "SCT" # define another one
genes.use <- rownames(object@data)
- Seurat2:
object@meta.data <- data.frame(nGene, nUMI)
# View metadata data frame, stored in object@meta.data
pbmc[[]]
# Retrieve specific values from the metadata
pbmc$nCount_RNA
pbmc[[c("percent.mito", "nFeature_RNA")]]
# Add metadata, see ?AddMetaData
random_group_labels <- sample(x = c("g1", "g2"), size = ncol(x = pbmc), replace = TRUE)
pbmc$groups <- random_group_labels
results will be stored in object@data
More interesting accessors afterwards:
object@calc.params$NormalizeData$scale.factor
object@calc.params$NormalizeData$normalization.method
will be stored in object@scale.data
Seurat:::RegressOutResid:
possible.models <- c("linear", "poisson", "negbinom")
latent.data <- FetchData(object = object, vars.all = vars.to.regress)
## extracts the log-scaled values
data.use <- object@data[genes.regress, , drop = FALSE]
regression.mat <- cbind(latent.data, data.use[1, ])
colnames(regression.mat) <- reg.mat.colnames
fmla_str = paste0("GENE ", " ~ ", paste(vars.to.regress, collapse = "+"))
qr = lm(as.formula(fmla_str), data = regression.mat, qr = TRUE)$qr
resid <- qr.resid(qr, gene.expr[x, ])
object@var.genes
object@hvg.info$gene.mean
object@hvg.info$gene.dispersion
object@hvg.info$gene.dispersion.scaled