RNAseq|WGCNA-组学数据黏合剂，代码实战-一（尽）文（力）解决文献中常见的可视化图-阿里云开发者社区

本文首发于“生信补给站”公众号 https://mp.weixin.qq.com/s/M5HmKOMtxFz_0YKPSlM6fw

WGCNA （weighted gene co-expression network analysis）权重基因共表达网络分析（流程模块见下图），可将表达模式相似的基因进行聚类，并分析模块与特定性状或表型之间的关联，常用于筛选关键表型的hub基因，是RNAseq分析中的一块很重要的拼图。而之所以叫组学数据黏合剂是因为表型可以是患者的临床信息（生存信息，分期信息，基线信息等），可以是重测序信息肿瘤（驱动基因的变异与否，signature ，CNV信息等），可以是转录组结果（免疫浸润，risk score ，GSVA ，分子分型结果），可以是单细胞数据（celltype ，AUCell 打分）等等。注：这些在公众号之前的文章中大多都有涉及，文末有部分链接。

本文将着重介绍WGCNA中常见图形的代码实现以及图形的意义，如果有其他常见的WGCNA图形而本文未介绍，欢迎留言交流。

一载入R包，数据

仍然使用之前处理过的TCGA的SKCM数据，此外将之前得到的cibersort免疫浸润的结果作为临床表型数据进行关联，文章末尾有测试数据获取方式。

library(tidyverse)
library(openxlsx)
library(data.table)
#BiocManager::install("impute")
library(WGCNA)
load("RNAseq.SKCM_WGCNA.RData")
head(cibersort_in,2)
expr2[1:4,1:4]

二 WGCNA分析

2.1 获取软阈值

WGCNA针对的是基因进行聚类，注意下自己的数据是否需要转置。

## 选取top 5000 mad genes
WGCNA_matrix = t(expr2[order(apply(expr2,1,mad), decreasing = T)[1:5000],]) 
datExpr <- WGCNA_matrix  
### beta value#########
powers = c(c(1:10), seq(from = 12, to=30, by=2))
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
par(mfrow = c(1,2))
cex1 = 0.9; #可自行调整
# Scale-free topology fit index as a function of the soft-thresholding power
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
     main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     labels=powers,cex=cex1,col="red");
# this line corresponds to using an R^2 cut-off of h
abline(h=0.90,col="red")
# Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[,1], sft$fitIndices[,5],
     xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
     main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")
sft$powerEstimate

左图为在不同软阈值（x轴）情况下的无标度拟合指数（scale-free fit index，y轴）。本示例中阈值选择的0.9，也可以根据数据的真实情况进行微调。可见当阈值选择0.9时最小软阈值（sft$powerEstimate）为3，因此选择3作为最优软阈值用于后续分析。

2.2 构建共表达矩阵

得到表达矩阵以及最优beta值，就可以通过一步法构建共表达矩阵，然后使用plotDendroAndColors函数进行可视化

net = blockwiseModules(
  datExpr,
  power = sft$powerEstimate,
  #power = sft$powerEstimate,
  maxBlockSize = 6000,
  TOMType = "unsigned", minModuleSize = 30,
  reassignThreshold = 0, mergeCutHeight = 0.25,
  numericLabels = TRUE, pamRespectsDendro = FALSE,
  saveTOMs = TRUE,
  saveTOMFileBase = "AS-green-FPKM-TOM",
  verbose = 3
)
table(net$colors) 
####模块可视化####
mergedColors = labels2colors(net$colors)
table(mergedColors)
# 可视化
plotDendroAndColors(net$dendrograms[[1]], mergedColors[net$blockGenes[[1]]],
                    "Module colors",
                    dendroLabels = FALSE, hang = 0.03,
                    addGuide = TRUE, guideHang = 0.05)

不同的颜色代表不同的模块，相同颜色的模块中为表达模式相似的基因。grey模块中包含了所有未参与聚类的基因，一般不用于后续分析。
可以通过table(net 或者 colors)或者mergedColors=labels2colors(netcolors) 或者 mergedColors = labels2colors(netcolors) 来查看每种颜色中有多少基因。

2.3 绘制网络

热图该步骤非常消耗计算资源和时间，此处示例选取其中部分基因进行展示

#随机选取部分基因作图
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
nSelect = 500
# 设置随机种子，方便复现结果
set.seed(1234);
select = sample(nGenes, size = nSelect);
selectTOM = dissTOM[select, select];
# There’s no simple way of restricting a clustering tree to a subset of genes, so we must re-cluster.
selectTree = hclust(as.dist(selectTOM), method = "average")
selectColors = moduleColors[select];
# Open a graphical window
sizeGrWindow(9,9)
plotDiss = selectTOM^7;
diag(plotDiss) = NA;
TOMplot(plotDiss, selectTree, selectColors,terrainColors = T, 
        main = "Network heatmap plot, selected genes")

使用col函数修改颜色

library(gplots)
TOMplot(plotDiss, selectTree, selectColors, main = "Network heatmap plot, selected genes", 
        col=gplots::colorpanel(250,'red',"orange",'lemonchiffon'))

基因之间的相关性热图，其中颜色越深，说明基因之间的相互作用越强。

2.4 模块和性状的关系

本示例使用cibersort的结果，当然可以替换为任何你关注的，

datTraits <- cibersort_in %>% 
  select(sample:Neutrophils ) %>%
  column_to_rownames("sample")
nSamples = nrow(datExpr)
sampleNames = rownames(datExpr)
traitRows = match(datTraits$sample, rownames(datTraits) )  
moduleColors <- labels2colors(net$colors)
# Recalculate MEs with color labels
MEs0 = moduleEigengenes(datExpr, moduleColors)$eigengenes
MEs = orderMEs(MEs0);
##计算相关系数和P值
moduleTraitCor = cor(MEs, datTraits , use = "p");
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples)
sizeGrWindow(10,6)
# 图中展示P值
textMatrix = paste(signif(moduleTraitCor, 2), "\n(",
                   signif(moduleTraitPvalue, 1), ")", sep = "");
dim(textMatrix) = dim(moduleTraitCor)
#通过par设置图形边界
par(mar = c(10, 10, 3, 3));
# 热图的形式展示结果
labeledHeatmap(Matrix = moduleTraitCor,
               xLabels = colnames(datTraits),
               yLabels = names(MEs),
               ySymbols = names(MEs),
               colorLabels = FALSE,
               colors = blueWhiteRed(50),
               textMatrix = textMatrix,
               setStdMargins = FALSE,
               cex.text = 0.5,
               zlim = c(-1,1),
               main = paste("Module-trait relationships"))

一般会根据相关性的绝对值筛选最相关模块，负相关模块也要考虑。假如重点关注的表型是CD8 Tcell 或者 Treg免疫浸润程度，可以看到这2个表型和MEblue模块的相关系数（颜色）最高且P值（括号内数值）很显著。
然后就可以提取MEblue模块中的所有基因或者 hub基因进行后续分析

2.5 显著模块柱形图

还可以通过柱形图的形式查看与目标性状最显著的模块

########## 显著模块柱形图#######
moduleColors = labels2colors(net$colors)
y <- datTraits[, "T cells CD8"]
GS <- as.numeric(cor(y ,datExpr, use="p"))
GeneSignificance <-  abs(GS)
ModuleSignificance <- tapply(
  GeneSignificance,
  moduleColors, mean, na.rm=T)
plotModuleSignificance(GeneSignificance, moduleColors,ylim = c(0,0.4))

也可以很明显的看到blue模块最显著。

2.6 模块之间聚类以及热图

通过聚类树或者热图的方式查看模块之间的相关程度，可以把感兴趣的“表型“添加进去

# Recalculate module eigengenes
MEs = moduleEigengenes(datExpr, moduleColors)$eigengenes
CD8_T = as.data.frame(datTraits[,4]);
names(CD8_T) = "CD8_T"
modNames = substring(names(MEs), 3)
# Add the weight to existing module eigengenes
MET = orderMEs(cbind(MEs, CD8_T))
# Plot the relationships among the eigengenes and the trait
sizeGrWindow(5,7.5);
par(cex = 0.9)
plotEigengeneNetworks(MET, "", marDendro = c(0,4,1,2), marHeatmap = c(3,4,1,2), cex.lab = 0.8, xLabelsAngle
                      = 90)

仍然可以看到CD8_T 和 blue模块最相关。
注：可以通过plotHeatmaps = FALSE 或者 plotDendrograms = FALSE 参数，只绘制上半部分的树形图或者下面的热图。

2.7 MM vs GS

相关性查看基因与关键模块（blue）的相关性（Module Membership, MM）和基因与性状（CD8_T）的相关性（Gene Significance, GS）

pheno = "CD8_T"
# 获取关注的列
module_column = match(module, modNames)
pheno_column = match(pheno,colnames(datTraits))
# 获取模块内的基因
moduleGenes = moduleColors == module
sizeGrWindow(7, 7);
par(mfrow = c(1,1));
verboseScatterplot(abs(geneModuleMembership[moduleGenes, 
                                            module_column]),
                   abs(geneTraitSignificance[moduleGenes, 1]), 
                   xlab = paste("Module Membership in", module, "module"), 
                   ylab = paste("Gene significance for", pheno), 
                   main = paste("Module membership  vs gene significance\n"), 
                   cex.main = 1.2, cex.lab = 1.2, cex.axis = 
                     1.2, col = module)

通过散点图可以发现MM和GS呈正相关，说明与CD8_T高度相关的基因，在blue模块中也很重要。

2.8 提取模块

基因前面一些可视化结果很重要，但是WGCNA分析更需要的是获取目标模块的基因，可以是blue模块的所有基因，也可以提取blue模块的hub基因。1) 提取blue的所有基因

module = "blue";
# Select module probes
geneID= colnames(datExpr) ## 我们例子里面的probe就是基因名
inModule = (moduleColors==module);
module_gene = geneID[inModule];
write.csv(module_gene,"module_gene.csv")

2）提取hub基因
一般使用较多的方式是根据 |MM| > 0.8 且 |GS| > 0.2的阈值进行筛选hub基因，也可以根据数据表现自行更改阈值

hub<- abs(geneModuleMembership$MMblue)>0.8 & abs(geneTraitSignificance)>0.2
hub<-as.data.frame(dimnames(data.frame(datExpr))[[2]][hub])
head(hub,2)
#  dimnames(data.frame(datExpr))[[2]][hub]
#1                                     IGJ
#2                                   CXCL9

得到这些基因，就可以进行后续分析和挖掘了。常见的是单因素生存分析，lasso回归筛选，构建risk score等分析。

2.9 保存结果以备后用

save(module_gene,hub ,MEs,geneModuleMembership,geneTraitSignificance,
     file = "SKCM.WGCNA_MM_GS.RData")

更多原理和参数设置相关的请查阅WGCNA的文献以及官网。

RNAseq|WGCNA-组学数据黏合剂，代码实战-一（尽）文（力）解决文献中常见的可视化图

一载入R包，数据

二 WGCNA分析

2.1 获取软阈值

2.2 构建共表达矩阵

2.3 绘制网络

2.4 模块和性状的关系

2.5 显著模块柱形图

2.6 模块之间聚类以及热图

2.7 MM vs GS

2.8 提取模块

2.9 保存结果以备后用

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

RNAseq|WGCNA-组学数据黏合剂，代码实战-一（尽）文（力）解决文献中常见的可视化图

一 载入R包，数据

二 WGCNA分析

2.1 获取软阈值

2.2 构建共表达矩阵

2.3 绘制网络

2.4 模块和性状的关系

2.5 显著模块柱形图

2.6 模块之间聚类以及热图

2.7 MM vs GS

2.8 提取模块

2.9 保存结果以备后用

热门文章

最新文章

相关课程

相关电子书

相关实验场景

一载入R包，数据