Seurat

Standard workflow

pbmc.counts <- Read10X(data.dir = "~/Downloads/pbmc3k/filtered_gene_bc_matrices/hg19/")
pbmc <- CreateSeuratObject(counts = pbmc.counts)
pbmc <- NormalizeData(object = pbmc)
pbmc <- FindVariableFeatures(object = pbmc)
pbmc <- ScaleData(object = pbmc)
pbmc <- RunPCA(object = pbmc)
pbmc <- FindNeighbors(object = pbmc)
pbmc <- FindClusters(object = pbmc)
pbmc <- RunTSNE(object = pbmc)
DimPlot(object = pbmc, reduction = "tsne")

Seurat 2 vs. 3

Seurat v2.X	Seurat v3.X
`object@data`	`GetAssayData(object = object)`
`object@raw.data`	`GetAssayData(object = object, slot = "counts")`
`object@scale.data`	`GetAssayData(object = object, slot = "scale.data")`
`object@cell.names`	`colnames(x = object)`
`rownames(x = object@data)`	`rownames(x = object)`
`object@var.genes`	`VariableFeatures(object = object)`
`object@hvg.info`	`HVFInfo(object = object)`
`object@assays$assay.name`	`object[["assay.name"]]`
`object@dr$pca`	`object[["pca"]]`
`GetCellEmbeddings(object = object, reduction.type = "pca")`	`Embeddings(object = object, reduction = "pca")`
`GetGeneLoadings(object = object, reduction.type = "pca")`	`Loadings(object = object, reduction = "pca")`
`AddMetaData(object = object, metadata = vector, col.name = "name")`	`object$name <- vector`
`object@meta.data$name`	`object$name`
`object@idents`	`Idents(object = object)`
`SetIdent(object = object, ident.use = "new.idents")`	`Idents(object = object) <- "new.idents")`
`SetIdent(object = object, cells.use = 1:10, ident.use = "new.idents")`	`Idents(object = object, cells = 1:10) <- "new.idents")`
`StashIdent(object = object, save.name = "saved.idents")`	`object$saved.idents <- Idents(object = object)`
`levels(x = object@idents)`	`levels(x = objects)`
`RenameIdent(object = object, old.ident.name = "old.ident", new.ident.name = "new.ident")`	`RenameIdents(object = object, "old.ident" = "new.ident")`
`WhichCells(object = object, ident = "ident.keep")`	`WhichCells(object = object, idents = "ident.keep")`
`WhichCells(object = object, ident.remove = "ident.remove")`	`WhichCells(object = object, idents = "ident.remove", invert = TRUE)`
`WhichCells(object = object, max.cells.per.ident = 500)`	`WhichCells(object = object, downsample = 500)`
`WhichCells(object = object, subset.name = "name", low.threshold = low, high.threshold = high)`	`WhichCells(object = object, expression = name > low & name < high)`
`FilterCells(object = object, subset.names = "name", low.threshold = low, high.threshold = high)`	`subset(x = object, subset = name > low & name < high)`
`SubsetData(object = object, subset.name = "name", low.threshold = low, high.threshold = high)`	`subset(x = object, subset = name > low & name < high)`
`MergeSeurat(object1 = object1, object2 = object2)`	`merge(x = object1, y = object2)`

Data

Seurat has 3 data slots (source):

counts (raw.data in v2)
- The raw data slot (object@raw.data) represents the original expression matrix, input when creating the Seurat object, and prior to any preprocessing by Seurat. For example, this could represent the UMI matrix generated by DropSeqTools or 10X CellRanger, a count matrix from featureCounts, an FPKM matrix produced by Cufflinks, or a TPM matrix produced by RSEM. Row names represent gene names, and column names represent cell names. Either raw counts or normalized values (i.e. FPKM or TPM) are fine, but the input expression matrix should not be log-transformed. Please note that Seurat can be used to analyze single cell data produced by any technology, as long as you can create an expression matrix. We provide the Read10X function to provide easy importing for datasets produced by the 10X Chromium system. Seurat uses count data when performing gene scaling and differential expression tests based on the negative binomial distribution.
data = log-normalized data
- The data slot stores normalized and log-transformed single cell expression. This maintains the relative abundance levels of all genes, and contains only zeros or positive values. See ?NormalizeData for more information. This data is used for visualizations, such as violin and feature plots, most differential expression tests, finding high-variance genes, and as input to ScaleData (see below).
scale.data (= z-score normalized data)
- The scale.data slot represents a cell’s relative expression of each gene, in comparison to all other cells. Therefore this matrix contains both positive and negative values. See ?ScaleData for more information If regressing genes against unwanted sources of variation (for example, to remove cell-cycle effects), the scaled residuals from the model are stored here. This data is used as input for dimensional reduction techniques, and is displayed in heatmaps.

> GetAssayData(as_fet_comb, "counts") %>% dim
[1] 0 0
> GetAssayData(as_fet_comb, "scale.data") %>% dim
[1] 1 1
> GetAssayData(as_fet_comb, "data") %>% dim
[1] 1000 1491

Raw data

stored in object@raw.data (Seurat2)
can be accessed so:

raw.data <- GetAssayData(object = object,
                         assay.type = assay.type,
                         slot = "raw.data")

Normalized data

stored in object@data
can be added so:

object <- SetAssayData(object = object,
                       assay.type = assay.type,
                       slot = "data",
                       new.data = normalized.data)

If there are multiple assays stored within the same Seurat object, one will manually have to select the "active" one:

> srt
An object of class Seurat
50120 features across 26335 samples within 3 assays
Active assay: SCT (20844 features)
 2 other assays present: RNA, integrated
 2 dimensional reductions calculated: pca, umap

> srt@active.assay # find out which one's active
> DefaultAssay(srt) <- "SCT" # define another one

Genes

genes.use <- rownames(object@data)

Metadata

Seurat2: object@meta.data <- data.frame(nGene, nUMI)

# View metadata data frame, stored in object@meta.data
pbmc[[]]

# Retrieve specific values from the metadata
pbmc$nCount_RNA
pbmc[[c("percent.mito", "nFeature_RNA")]]

# Add metadata, see ?AddMetaData
random_group_labels <- sample(x = c("g1", "g2"), size = ncol(x = pbmc), replace = TRUE)
pbmc$groups <- random_group_labels

Normalization

results will be stored in object@data

More interesting accessors afterwards:

object@calc.params$NormalizeData$scale.factor
object@calc.params$NormalizeData$normalization.method

Scaling

will be stored in object@scale.data

 Seurat:::RegressOutResid:
 
possible.models <- c("linear", "poisson", "negbinom")
 
latent.data <- FetchData(object = object, vars.all = vars.to.regress)

## extracts the log-scaled values
data.use <- object@data[genes.regress, , drop = FALSE]

regression.mat <- cbind(latent.data, data.use[1, ])
colnames(regression.mat) <- reg.mat.colnames

fmla_str = paste0("GENE ", " ~ ", paste(vars.to.regress, collapse = "+"))

qr = lm(as.formula(fmla_str), data = regression.mat, qr = TRUE)$qr
resid <- qr.resid(qr, gene.expr[x, ])

Variable Genes

object@var.genes
object@hvg.info$gene.mean
object@hvg.info$gene.dispersion
object@hvg.info$gene.dispersion.scaled

More object interactions

see Seurat website

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seurat

Standard workflow

Seurat 2 vs. 3

Data

Raw data

Normalized data

Genes

Metadata

Normalization

Scaling

Variable Genes

More object interactions

FilesExpand file tree

Seurat.md

Latest commit

History

Seurat.md

File metadata and controls

Seurat

Standard workflow

Seurat 2 vs. 3

Data

Raw data

Normalized data

Genes

Metadata

Normalization

Scaling

Variable Genes

More object interactions