1
votes

something like thisI want to create a violin plot in R, for the math averages (y axis) depending on the level (x axis).

The problem is that I want my violin plot to be split and contain, for each level, the averages for female on one side and male on another (which account for 2 different columns in my data frame).

This is my data:

structure(list(section.name = structure(1:19, .Label = c("a", 
"b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", 
"o", "p", "q", "r", "s "), class = "factor"), level = structure(c(1L, 
2L, 4L, 3L, 2L, 4L, 5L, 1L, 1L, 3L, 4L, 2L, 5L, 1L, 2L, 4L, 3L, 
1L, 5L), .Label = c("level 0 ", "level 1", "level 2", "level 3", 
"level 4"), class = "factor"), math.av.females = c(62L, 72L, 
49L, 57L, 64L, 78L, 81L, 63L, 68L, 74L, 70L, 64L, 80L, 67L, 70L, 
72L, 80L, 78L, 64L), math.av.males = c(58L, 85L, 58L, 55L, 62L, 
76L, 76L, 61L, 66L, 76L, 68L, 66L, 82L, 59L, 75L, 68L, 78L, 75L, 
61L)), class = "data.frame", row.names = c(NA, -19L))
2

2 Answers

1
votes

We named the data.frame you provided as df. One of your levels has an extra space, so we remove that:

levels(df$level)
[1] "level 0 " "level 1"  "level 2"  "level 3"  "level 4" 
df$level = factor(gsub(" ","",as.character(df$level)))
levels(df$level)
[1] "level0" "level1" "level2" "level3" "level4"
levels(df$section.name) = gsub(" ","",levels(df$section.name))

Then you need to convert it into long format, meaning one common column for the average value, and another column indicating whether it's male or female:

library(tidyr)
library(ggplot2)
plotdf = pivot_longer(df,-c(section.name,level))
plotdf$name = gsub("math.av.","",plotdf$name)
ggplot(plotdf,aes(x=level,y=value,fill=name)) + geom_violin()

enter image description here

To do a split violin plot, you can use the function written by @jan-glx:

ggplot(plotdf,aes(x=level,y=value,fill=name)) + geom_split_violin()

enter image description here

Or use the package vioplot, with your original df:

library(vioplot)
vioplot(math.av.males ~level,data=df,
col="turquoise",side="right",ylab="math.av")
vioplot(math.av.females ~level,data=df,
col="orange",side="left",add=TRUE)
legend("topleft",fill=c("turquoise","orange"),c("males","females"),cex=0.6)

enter image description here

0
votes
library(dplyr)
library(tidyr)
library(ggplot2)

df %>%
  pivot_longer(cols=starts_with("math.av"), names_to = c("math","gender"),
               names_pattern="(math.av).(.+)") %>%
  ggplot(aes(x=level,y=value,fill=gender)) +
  geom_violin()