本文首发于“生信补给站”公众号 https://mp.weixin.qq.com/s/_HFYizD8vmYcXdwsfIErXQ
在空间转录组|数据读入,标准数据形式外,还有哪些"天残地缺"可以读取提到了多种形式的数据读取,在原函数Load10X_Spatial的基础上进行了 “简陋”的修改,添加一些判断使之可以读取上文提到的多种数据形式。
后台回复 “函数” 既可以获取Load10X_Spatial_change.R 和 测试数据文件。第二部分为空转的Seurat标准分析流程,重点介绍了和单细胞转录组的差异。
一 Load10X_Spatial_change函数测试
加载R包和函数,然后使用不用场景进行测试。
library(Seurat) library(jsonlite) library(png) library(tidyverse) library(ggpubr) library(patchwork) source("Load10X_Spatial_change.R")
1, 标准数据读取
含有h5文件 + spatial文件夹
T0 <- Load10X_Spatial_change(data.dir = ".\\outs\\",filename = "filtered_feature_bc_matrix.h5") for (i in colnames((T0@images$slice1@coordinates))) { T0@images$slice1@coordinates[[i]] <- as.integer(T0@images$slice1@coordinates[[i]]) } SpatialDimPlot(T0,alpha = 0)
注:循环修改integer 的操作,也许你的数据不需要
2,filtered_feature_bc_matrix文件夹 + spatial文件夹(无 h5文件)
data.dir = ".\\outs_noh5\\" T1 <- Load10X_Spatial_change(data.dir = data.dir, filename = "filtered_feature_bc_matrix") for (i in colnames((T1@images$slice1@coordinates))) { T1@images$slice1@coordinates[[i]] <- as.integer(T1@images$slice1@coordinates[[i]]) } p1 <- SpatialDimPlot(T1,alpha = 0) p2 <- SpatialFeaturePlot(T1, features = "nFeature_Spatial", pt.size.factor = 3) p1 + p2
3, h5 + spatial文件夹(只有tissue_hires_image.png )
通过image.name指定使用的图片
data.dir = ".\\outs_onlyhires_h5\\" T2 <- Load10X_Spatial_change(data.dir = data.dir, filename = "filtered_feature_bc_matrix.h5", #slice = "sli", image.name = "tissue_hires_image.png") for (i in colnames((T2@images[[1]]@coordinates))) { T2@images[[1]]@coordinates[[i]] <- as.integer(T2@images[[1]]@coordinates[[i]]) } SpatialDimPlot(T2,alpha = 0) SpatialFeaturePlot(T2, features = "nFeature_Spatial", pt.size.factor = 8)
4, filtered_feature_bc_matrix + spatial文件夹(只有tissue_hires_image.png )
data.dir = ".\\outs_onlyhires\\" T3 <- Load10X_Spatial_change(data.dir = data.dir, filename = "filtered_feature_bc_matrix", image.name = "tissue_hires_image.png") for (i in colnames((T3@images$slice1@coordinates))) { T3@images$slice1@coordinates[[i]] <- as.integer(T3@images$slice1@coordinates[[i]]) } SpatialDimPlot(T3,alpha = 0) SpatialFeaturePlot(T3, features = "nFeature_Spatial", pt.size.factor = 8)
二 空转标准分析
空转流程和单细胞转录组的流程有一些细微的区别,重点介绍这部分
1,数据预处理(过滤,NormalizeData)
关于空转数据是否进行质控过滤,以及根据哪些条件进行过滤差异“很大”,且都有较多文章支撑,这里仅列出过滤基因 以及 spot 的方式,是否过滤根据自己的数据情况确定。
注:Seurat的官网并未进行质控,直接SCT分析。
1)计算线粒体/核糖体/特定基因集的比例
mt.genes <- grep(pattern = "^MT-", x = rownames(T0), value = TRUE) T0$percent.mito <- (Matrix::colSums(T0@assays$Spatial@counts[mt.genes, ])/Matrix::colSums(T0@assays$Spatial@counts))*100
如果想针对核糖体或者特定基因集合,只需进行对应的改即可。
2)过滤/保留基因
保留在所有spot中的表达量之和大于5的基因
genes_to_keep <- setdiff(names(which(Matrix::rowSums(T0@assays$Spatial@counts )>5)),mt.genes)
也可以是自定义的基因集合。
3)subset过滤spot
T0 <- subset(T0, features =genes_to_keep, #针对基因 subset = nFeature_Spatial > 300 & percent.mito < 30 #针对spots ) plot1 <- VlnPlot(T0, features = "nCount_Spatial", pt.size = 0.1) + NoLegend() plot2 <- SpatialFeaturePlot(T0, features = "nCount_Spatial",pt.size.factor = 3) + theme(legend.position = "right") wrap_plots(plot1, plot2)
NormalizeData
相较于单细胞转录组LogNormalize,空转数据推荐使用SCTransform。下图是来源于Seurat官网(https://satijalab.org/seurat/articles/seurat5_spatial_vignette.html#gene-expression-visualization)的比较结果。
T0 <- SCTransform(T0, assay = "Spatial", verbose = FALSE)
2,基因可视化
空转的可视化函数扩充了SpatialFeaturePlot 和 SpatialDimPlot (以空转切片为背景) ,相较于单细胞的FeaturePlot,DimPlot函数。
通过pt.size.factor设置点的大小,仍然可以使用ggplot2进行一些图形的修饰(theme,legend等)。
SpatialFeaturePlot(T0, features = c("COL1A1", "EPCAM"), pt.size.factor = 4) library(ggplot2) SpatialFeaturePlot(T0, features = c("COL1A1", "EPCAM"),pt.size.factor = 3) + theme(legend.text = element_text(size = 0), legend.title = element_text(size = 20), legend.key.size = unit(1, "cm"))
p1 <- SpatialFeaturePlot(T0, features = "COL1A1", pt.size.factor = 1) p2 <- SpatialFeaturePlot(T0, features = "COL1A1", alpha = c(0.1, 1)) p1 + p2
3,降维聚类及可视化
和单细胞差异不大
T0 <- RunPCA(T0, assay = "SCT", verbose = FALSE) T0 <- FindNeighbors(T0, reduction = "pca", dims = 1:30) T0 <- FindClusters(T0, verbose = FALSE) T0 <- RunUMAP(T0, reduction = "pca", dims = 1:30) p1 <- DimPlot(T0, reduction = "umap", label = TRUE) p2 <- SpatialDimPlot(T0, label = TRUE, label.size = 3) p1 + p2
可以使用SpatialDimPlot 的cells.highlight参数, 只展示感兴趣的簇。
SpatialDimPlot(T0, cells.highlight = CellsByIdentities(object = T0, idents = c(2, 0, 3)), facet.highlight = TRUE, ncol = 3 ,pt.size.factor = 3 )
4,空间表达变量基因
由于空转的特殊性,Seurat提供了两种计算的方式 。
1)FindMarkers
根据定义好的cluster或者celltype进行计算(和scRNA类似)
#指定比较的group de_markers <- FindMarkers(T0, ident.1 = 0, ident.2 = 3) SpatialFeaturePlot(object = T0, features = rownames(de_markers)[1:3], alpha = c(0.1, 1), pt.size.factor = 3, ncol = 3) #全部比较,注意activate的ident de_markers2 <- FindAllMarkers(T0)
注:FindAllMarkers函数注意当前activate的ident
2)FindSpatiallyVariableFeatures
在没有cluster 或者 celltype等预先注释的情况下使用,倾向于返回在某些切片区域高表达的基因。
T0 <- FindSpatiallyVariableFeatures(T0, assay = "SCT", features = VariableFeatures(T0)[1:1000], selection.method = "moransi") top.features <- head(SpatiallyVariableFeatures(T0, selection.method = "moransi"), 6) SpatialFeaturePlot(T0, features = top.features, pt.size.factor = 3, ncol = 3, alpha = c(0.1, 1))
两种之间会有一些overlap,但是也会有一些差异。