论文
Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species
https://www.nature.com/articles/s41588-023-01340-y
西红柿NG_superPan正文.pdf
数据分析的代码
https://github.com/HongboDoll/TomatoSuperPanGenome
论文中提供了作图用到的原始数据,我们可以试着用论文中的原始数据来复现论文中的图,今天的推文复现论文中的figure1
首先是右侧的进化树
论文中提供的是nexus格式的书,而且是放在了excel里,这里我们复制到文本文件中,修改成newick格式
读取树文件
library(ggtree)
library(treeio)
library(ggh4x)
tree<-read.tree("data/20230408/figure1b.txt")
tree
添加根部的小短线
tree$root.edge<-0.1
最基本的进化树
ggtree(tree)+
geom_tiplab(fontface="italic")+
theme_tree2()+
geom_rootedge() -> p
p
这里右侧显示不全,可以通过设置x轴的范围来解决
论文中的树是从左上角开始,ggtree默认是从左下角开始,想实现论文中的效果可以通过旋转指定节点来实现
先给每个节点添加上数字
p+
geom_nodelab(aes(label=node))
直接旋转15节点
ggtree::rotate(p,node = 15) -> p1
p1
添加右侧的分组标签和下方的标尺
p1+
scale_x_continuous(breaks = c(seq(0,3.2,by=0.45),3.5),
limits = c(-0.3,5.5),
labels = c(8:0))+
guides(x=guide_axis_truncated(trunc_lower = 0,
trunc_upper = 3.5))+
labs(x="Age (million years)")+
theme(axis.title.x = element_text(hjust=0.3))+
#geom_nodelab(aes(label=node))+
geom_cladelab(node=18,label="I",textcolor="red",
offset=1)+
geom_cladelab(node=20,label="II",textcolor="red",
offset=1)+
geom_cladelab(node=23,label="III",textcolor="red",
offset=1)+
geom_cladelab(node=24,label="IV",textcolor="red",
offset=1.7) -> pp
pp
然后是添加文本注释
ggplot_build(pp)$data[[5]]
pp+
annotate(geom = "segment",x=1.8,y=1.5,xend = 2.6,yend=3,
arrow=arrow(angle = 20,type="closed",length = unit(3,'mm')))+
annotate(geom="label",x=1.8,y=1.5,label="Mean wild-cultivated\ntomato divergence\n0.71 Ma",hjust=0.5)+
annotate(geom = "segment",x=1,y=4,xend = 1.7,yend=5.2,
arrow=arrow(angle = 20,type="closed",length = unit(3,'mm')))+
annotate(geom="label",x=1,y=4,label="Mean divergence\n1.73 Ma",hjust=0.5)+
annotate(geom = "segment",x=-0.2,y=10,xend = -0.05,yend=12.2,
arrow=arrow(angle = 20,type="closed",length = unit(3,'mm')))+
annotate(geom="label",x=-0.2,y=10,label="Fix at\n7.5 Ma",hjust=0.5)
论文中还把有的地方的枝长截断了,这个暂时不知道怎么实现了
左侧的堆积柱形图
library(readxl)
library(tidyverse)
fig1a.df<-read_excel("data/20230408/41588_2023_1340_MOESM5_ESM.xlsx",
sheet = "Fig. 1a",
skip = 1)
fig1a.df %>% colnames()
fig1a.df %>%
mutate(Species=factor(Species,levels = rev(Species))) %>%
pivot_longer(!Species) %>%
mutate(name=factor(name,levels = c("Non repetitive","Unknown repeats","Other repeats",
"DNA transpons","Other retranspons","ClassI/LTR",
"ClassI/LTR/Gypsy","ClassI/LTR/Copia"))) %>%
ggplot(aes(x=value,y=Species))+
geom_bar(stat = "identity",aes(fill=name))+
theme_bw()+
theme(legend.position = c(0.8,0.2),
panel.border = element_blank(),
axis.text.x = element_text(),
axis.line.x = element_line(),
#axis.ticks.x = element_line(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid = element_blank(),
legend.title = element_blank())+
scale_x_continuous(limits = c(0,2000000000),
labels = function(x){x/1000000},
breaks = seq(0,1500000000,by=300000000))+
guides(x=guide_axis_truncated(trunc_lower = 0,
trunc_upper = 1500000000))+
labs(x=NULL,y=NULL)+
scale_fill_manual(values = c("#34a56c","#26c2e2","#37519e","#3fbc9c",
"#eedf42","#907135","#f05435","#67696b"),
limits=rev(c("Non repetitive","Unknown repeats","Other repeats",
"DNA transpons","Other retranspons","ClassI/LTR",
"ClassI/LTR/Gypsy","ClassI/LTR/Copia")),
labels=rev(c("Non repetitive","Unknown repeats","Other repeats",
"DNA transpons","Other retranspons","LTR/Others",
"LTR/Gypsy","LTR/Copia")))
最后是拼图
library(patchwork)
p.stacked+p.tree
ggtree这个R包的作者Y叔将ggtree的帮助文档写成了一本书,中文版现在已经出版了,彩版,现在购买是半价才54块5,非常划算,赶紧入手吧!ggtree不仅能做进化树的可视化展示,还可以做聚类分析的树图,所以只要自己的数据分析中涉及到聚类分析,买这本都会有用的!
示例数据和代码可以给推文点赞,然后点击在看,最后留言获取
欢迎大家关注我的公众号
小明的数据分析笔记本
小明的数据分析笔记本 公众号 主要分享:1、R语言和python做数据分析和数据可视化的简单小例子;2、园艺植物相关转录组学、基因组学、群体遗传学文献阅读笔记;3、生物信息学入门学习资料及自己的学习笔记!