0
votes

I am making grouped violin plots of my own dataset using ggplot2. The dataset contains 350 observations (7 scenarios in 5 locations and every situation has 10 replicates) of 3 variables and part of it looks like this:

Part of my dataset: enter image description here

The codes I used are here:

DF = read.csv("C:\\Users\\lqy\\Desktop\\Pilot_data.csv", na.strings = "---", header = TRUE)
DF = data.frame(DF)

DF$Scenarios = as.integer(DF$Scenarios)
    
figure = ggplot(DF, aes(x = Location, y = Recovery, fill = Scenarios)) +
  geom_violin() + 
  stat_summary(fun="median",geom="point") +
  labs(x="Locations", y="Days to 90% recovery") +
  theme(axis.text = element_text(size = 10)) +
  theme(axis.title = element_text(size = 10)) +
  theme(legend.position = "right")
figure

From these codes I have a figure that looks like this:

enter image description here

I am quite happy with this figure but the median points I added seem to be clustered in the middle of the plot, instead of in each violins. I'm guessing this is because they are supposed to align right above the point on the x-axis. But is there a way to put the median points in each corresponding violin in this kind of plot?

Many thanks for answering this question for me!

ADDITION: the dataset is here (acquired by the code dput(DF[sample(nrow(DF),45),]))

structure(list(Scenarios = c(8L, 2L, 2L, 2L, 10L, 5L, 5L, 10L, 
10L, 3L, 10L, 1L, 2L, 5L, 8L, 2L, 1L, 3L, 1L, 8L, 10L, 4L, 8L, 
2L, 4L, 3L, 8L, 10L, 1L, 1L, 10L, 5L, 3L, 8L, 8L, 5L, 8L, 5L, 
10L, 1L, 8L, 8L, 8L, 3L, 10L), Location = c("Total_Catchment", 
"Sec_51", "Sec_53", "Total_Catchment", "Sec_55", "Sec_55", "Sec_51", 
"Sec_51", "Sec_54", "Total_Catchment", "Sec_55", "Sec_55", "Sec_54", 
"Sec_53", "Sec_51", "Sec_55", "Sec_53", "Sec_55", "Sec_54", "Total_Catchment", 
"Sec_51", "Sec_53", "Sec_55", "Total_Catchment", "Sec_54", "Total_Catchment", 
"Sec_53", "Sec_53", "Sec_51", "Sec_54", "Sec_53", "Sec_51", "Sec_53", 
"Sec_54", "Sec_54", "Sec_55", "Sec_55", "Sec_54", "Sec_51", "Sec_51", 
"Sec_51", "Total_Catchment", "Sec_51", "Sec_55", "Sec_53"), Recovery = c(316.5, 
839.5, 179.5, 277.5, 923.5, 664.5, 494.5, 639.5, 273.5, 327.5, 
830.5, 714.5, 357.5, 300.5, 504.5, 752.5, 265.5, 535.5, 208.5, 
303.5, 564.5, 339.5, 766.5, 396.5, 273.5, 271.5, 185.5, 370.5, 
825.5, 191.5, 186.5, 582.5, 364.5, 326.5, 332.5, 901.5, 706.5, 
187.5, 577.5, 680.5, 506.5, 301.5, 559.5, 713.5, 324.5)), row.names = c(20L, 
121L, 163L, 37L, 329L, 348L, 103L, 112L, 273L, 52L, 322L, 309L, 
240L, 187L, 76L, 338L, 155L, 339L, 253L, 69L, 133L, 158L, 342L, 
2L, 235L, 45L, 146L, 161L, 106L, 239L, 189L, 117L, 157L, 265L, 
258L, 299L, 321L, 215L, 98L, 127L, 132L, 27L, 111L, 283L, 203L
), class = "data.frame")
1
Please dput(DF) and paste the output in your question in order to reproduce the problem.Duck
I tried the command dput(DF) but the data is too long to be copied here. Is there another way to copy my dataset here? Sorry I'm super new to R. @DuckSkylar Xie
Try dput(DF[sample(nrow(DF),45),]) and paste the output in your question!Duck
I have copied the output in my question!Skylar Xie

1 Answers

0
votes

The solution will probably involve adding a dodge position to the summary layer. Example with dummy data below:

library(ggplot2)

df <- data.frame(
  Locations = rep(c("Sec_51", "Sec_53"), each = 1000),
  Recovery = rnorm(2000),
  Scenarios = rep(rep(LETTERS[1:5], each = 200), 2)
)

ggplot(df, aes(Locations, Recovery,
               group = interaction(Locations, Scenarios))) +
  geom_violin(aes(fill = Scenarios)) +
  stat_summary(fun = median, geom = "point",
               position = position_dodge(0.9))