I am trying to add kruskal Wallis and pairwise Wilcoxon test to the figure to show which groups are significant different, but I have multiple groups/subgroups within each group and facet which makes it complicated.
Here is the R code by using iris dataset as an example, the idea is to perform Kruskal.test across different treatments (A, B, C) for different variables (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) per species, and also wilcox.test pairwise between them:
rm(list=ls(all=TRUE)); cat('\014') # clear workspace
library(tidyverse)
library(ggplot2)
library(viridis)
library(rstatix)
data(iris)
iris$treatment <- rep(c("A","B","C"), length(iris$Species)/3)
mydf <- gather(iris,Variable,value,Sepal.Length:Petal.Width)
# change number to create more difference
mydf[mydf$treatment=="B",]$value <- mydf[mydf$treatment=="B",]$value*1.2
#mydf[mydf$treatment=="C",]$value <- mydf[mydf$treatment=="C",]$value+0.3
# do pairwise Wilcoxon test for pairwise comparisons between groups
df_wilcox <- mydf %>%
group_by(Species,Variable) %>%
pairwise_wilcox_test(value ~ treatment) %>%
add_y_position(step.increase = 0.02)
# do Kruskal Wallis test to see whether or not there is statistically significant difference between three or more groups
df_kw <- compare_means(value ~ treatment, mydf, group.by = c("Species","Variable"), method="kruskal")
# plot boxplot with wilcoxon and kruskal test results
P <- ggplot(data=mydf,
aes(x=treatment, y=value, fill=Variable))+
stat_boxplot(geom = "errorbar")+geom_boxplot(outlier.shape = NA)+
facet_wrap(~Species,nrow=1)+
theme_bw()+
theme(axis.text=element_text(size=12),axis.title=element_text(size=16),plot.title=element_text(size=20)) +
theme(strip.text = element_text(size=14))+
scale_fill_viridis(discrete = TRUE) +
guides(fill=guide_legend(title="Variable"))+
stat_pvalue_manual(df_wilcox,color ="Variable",step.group.by="Variable",tip.length = 0,step.increase = 0.02)
#stat_pvalue_manual(df_wilcox,color ="Variable",step.group.by="Variable",tip.length = 0,step.increase = 0.02,hide.ns=T) #hide non-significant
# change legend title and wilcoxon test color
ggpar(P,legend.title = "Wilcoxon test",palette = c("#440154FF","#3B528BFF","#21908CFF","#FDE725FF"))
This produces the following plot:
To improve the figure, I want to:
- automatic add Kruskal test result from 'df_kw' as text to the figure as well and only show significant p-value (e.g. KW(petal.length) p = 0.003)
- make the wilcoxon line between treatment (e.g. "A", "B", "C") for different variable (e.g. Petal/Speal Length/Width) looks neat (e.g. all on top of the boxplot with consistent line space)
- make the color of wilcoxon test line same as the color of boxplot (now the 'ggpar' don't always work if I hide non-significant, when the wilcoxon test variable is less than the actual variable)
I am stuck here and wondering anyone has a solution? Thank you very much!