1
votes

I want to modify the width of violin plots that I am constructing with the ggplot2 package.

The background is following: I get a dataset that counts a number of observations for a specific size of a particle. This size will be my y variable, the count of incidents I will call "incidents".

I simplified the data so I am only looking at 2 different sets (indicated by "id"), melted into 1 data frame.

library(ggplot2)
library(data.table)
dt1 <- data.frame(id=c("A","A","A","A","A","B","B","B","B","B"),y=c(10,20,30,40,50,10,20,30,40,50),incidents=c(3,1,5,9,2,4,2,7,1,5))

As far as I know, a violin plot is computing the width of the violin based on the count of appearances of a specific value. Because I want the y-axis of the plot to be the size, I need to have a data frame, which contains no more "incidents" column, but instead with new rows, depending on the value of "incidents".

I could not figure out how to reshape this easier, so I am running a for loop with a counter variable and an if clause for checking what kind of row the current iteration has to add to the new data frame (dt2).

Then I plot with the ggplot package using geom_violin().

library(ggplot2)
library(data.table)
dt1 <- data.frame(id=c("A","A","A","A","A","B","B","B","B","B"),y=c(10,20,30,40,50,10,20,30,40,50),incidents=c(3,1,5,9,2,4,2,7,1,5))

newlength <- sum(dt1$incidents) #This is the length of the new data table
dt2 <- data.table(id=rep(as.character(0),newlength),size=rep(0,newlength))
counter <- 1 #initialize
for (i in 1:newlength){ #iterate through all rows of new data table
if (i > sum(dt1$incidents[1:counter])){ #check if current iteration number is larger than the accumulated number of all incidents that have been checked so far in dt1
counter <- counter+1 #if so, increase counter
}
dt2[i,1:2 :=dt1[counter,c(1,2)]] #add the id and size information that is stored in dt1 at the row currently looked at
}

p <- ggplot(dt2, aes(x=1,y=size,color=id))
p + geom_violin()

So far so good, but this is not exactly what I want. Instead of the count of particles with specific sizes, I want the violin plot to give me the overall volume of all particles with this specific size. I.e. the width of the violins should be a function of the count (so the "incidents" value of dt1 or the number of rows with a certain parameter of dt2) and of the size itself. Meaning that I want the violin to become wider with higher y-values.

Considering e.g. a spherical shape of particles, an "incidents" value of 7 for a size of 10 should give a width of 7 * (4/3 * pi * (10/2)^3). For a particle of size 50, however, the same "incidents" value should result in a computed width of 7 * (4/3 * pi * (50/2)^3).

Is there any way to change the width-computation of the geom_violin plots as a function of the y-variable? Unfortunately I cannot really change the data frame to consider the mathematical formula for the volume (i.e. multiply the "incidents"with the spherical volume formula), because the number of rows for particles of sizes > 100 and "incidents"-values > 1000 reaches astronomical heights (would result in a data frame with ~10,000,000,000 rows for my data).

Any ideas are greatly appreciated.

Thanks in advance!

1

1 Answers

1
votes

First calculate the new variable:

dt1$total_particle_size<-dt1$incidents * (4/3 * pi * (dt1$y/2)^3)

Then plot:

ggplot(dt1, aes(x=id,y=y,fill=id,width=total_particle_size))+
 geom_violin()

enter image description here

I do get a warning which you might want to check.