本文首发于“生信补给站”公众号 https://mp.weixin.qq.com/s/zdSit97SOEpbnR18ARzixw
ggstatsplot
是ggplot2
包的扩展包,可以同时输出美观的图片和统计分析结果,对于经常做统计分析或者生信人来说非常有用。
数据准备
gapminder 数据集包含1952到2007年间(5年间隔)的142个国家的life expectancy, GDP per capita, 和 population信息。
#载入绘图R包 library(ggstatsplot) #载入gapminder 数据集 library(gapminder) head(gapminder)
ggstatsplot-R包含有很多绘图函数(文末会给出),本文仅展示ggbetweenstats函数使用方法。
ggbetweenstats绘图
1 基本绘图展示
显示2007年每个continent的预期寿命分布情况,并统计一下不同大陆之间平均预期寿命的是否有差异?差异是否显著?
#设置种子方便复现 set.seed(123) # Oceania数据太少,去掉后分析 ggstatsplot::ggbetweenstats( data = dplyr::filter( .data = gapminder::gapminder, year == 2007, continent != "Oceania" ), x = continent, y = lifeExp, nboot = 10, messages = FALSE )
可以看到图中展示出了2007年每个continent的预期寿命分布的箱线图,点图和小提琴图,均值,样本数;并且图形最上方给出了模型的一些统计量信息(整体)。
统计信息意义如下图所示(官网):
注:该函数根据分组变量中的个数自动决定是选择独立样本t检验(2组)还是单因素方差分析(3组或更多组)
2 添加统计值
上方给出了整体的检验P值,下面进行两两之间比较,并添加检验统计量
set.seed(123) ggstatsplot::ggbetweenstats( data = dplyr::filter( .data = gapminder::gapminder,year == 2007, continent != "Oceania"), x = continent,y = lifeExp, nboot = 10, messages = FALSE, effsize.type = "unbiased", # type of effect size (unbiased = omega) partial = FALSE, # partial omega or omega? pairwise.comparisons = TRUE, # display results from pairwise comparisons pairwise.display = "significant", # display only significant pairwise comparisons pairwise.annotation = "p.value", # annotate the pairwise comparisons using p-values p.adjust.method = "fdr", # adjust p-values for multiple tests using this method ) 3 图形美化 #添加标题和说明,x轴和y轴标签,标记,离群值,更改主题以及调色板。 set.seed(123) gapminder %>% # dataframe to use ggstatsplot::ggbetweenstats( data = dplyr::filter(.data = ., year == 2007, continent != "Oceania"), x = continent, # grouping/independent variable y = lifeExp, # dependent variables xlab = "Continent", # label for the x-axis ylab = "Life expectancy", # label for the y-axis plot.type = "boxviolin", # type of plot ,"box", "violin", or "boxviolin" type = "parametric", # type of statistical test , p (parametric), np ( nonparametric), r(robust), bf (Bayes Factor). effsize.type = "biased", # type of effect size nboot = 10, # number of bootstrap samples used bf.message = TRUE, # display bayes factor in favor of null hypothesis outlier.tagging = TRUE, # whether outliers should be flagged outlier.coef = 1.5, # coefficient for Tukey's rule outlier.label = country, # label to attach to outlier values outlier.label.color = "red", # outlier point label color mean.plotting = TRUE, # whether the mean is to be displayed mean.color = "darkblue", # color for mean messages = FALSE, # turn off messages ggtheme = ggplot2::theme_gray(), # a different theme package = "yarrr", # package from which color palette is to be taken palette = "info2", # choosing a different color palette title = "Comparison of life expectancy across continents (Year: 2007)", caption = "Source: Gapminder Foundation" ) + # modifying the plot further ggplot2::scale_y_continuous( limits = c(35, 85), breaks = seq(from = 35, to = 85, by = 5) )
其他绘图函数
Function | Plot | Description |
ggbetweenstats |
violin plots | for comparisons between groups/conditions |
ggwithinstats |
violin plots | for comparisons within groups/conditions |
gghistostats |
histograms | for distribution about numeric variable |
ggdotplotstats |
dot plots/charts | for distribution about labeled numeric variable |
ggpiestats |
pie charts | for categorical data |
ggbarstats |
bar charts | for categorical data |
ggscatterstats |
scatterplots | for correlations between two variables |
ggcorrmat |
correlation matrices | for correlations between multiple variables |
ggcoefstats |
dot-and-whisker plots | for regression models |