0
votes

Normally when you want to plot a variable vs another, you just supply the variable names and that's all cool. If the variable you want is the result of a computation, you can add that as a column to your data.frame or data.table and then use it. However this creates a lot of junk/redundant data if you have big data frames and need to plot these new columns just once. So I am essentially trying to find a way to use functions on variables instead.

I'll try to illustrate that with an example:

data(iris)
ggboxplot(iris, x="Species", y="Sepal.Width", add = "jitter")

enter image description here

Will plot the sepal width for different species of iris flowers. However if you want to a custom function on a variable, e.g:

ggboxplot(iris, x=round("Sepal.Length"), y="Sepal.Width", add = "jitter")
Error in round("Sepal.Length") : 
  non-numeric argument to mathematical function

This makes sense, since the function doesn't know that the text in quote refers to a variable.

Note that I have been using the ggpubr package for prettier plots, but I think the problem is essentially further down in ggplot2

ggplot(data = iris, aes(x=floor(Sepal.Length), y=Sepal.Width)) + geom_boxplot()
Warning message:
Continuous x aesthetic -- did you forget aes(group=...)? 

enter image description here

One way to bypass this is to override the aes mapping, however this results in a slightly weird x-axis

ggplot(data = iris, aes(y=Sepal.Width, x=Sepal.Length)) + geom_boxplot(mapping = aes(group=floor(Sepal.Length)))

enter image description here

I am thinking there has to be a simpler way to get this done, any advice? I would ideally like to keep using ggboxplot() from ggpubr package, but if it can't be done there I can consider using the ggplot2 alone.

1
On ggplot2 at least, the problem (as the error suggests) is that you are trying to plot a continuous x-variable in a boxplot, which takes categorical x. Another way to bypass this is by converting x to a factor: x = factor(floor(Sepal.Length)). That is essentially what the group argument is doing, but you don't get the "weird" axis because each integer is a category. Of course, if you don't have sequential integers (e.g. c(4, 5, 20, 40)) the axis will not be in scale (numerically-wise, let's say).Gabriel Silva

1 Answers

0
votes

To keep using the ggboxplot() function, a possible solution is to create a wrapper custom function around it as follow:

 libray(ggpubr)
 ggboxplot2 <- function(data, x, y, ...){
    data[[y]] <- floor(data[[y]])
    ggpubr::ggboxplot(data, x, y, ...)
 }

Create the boxplot using the custom function:

ggboxplot2(iris, x = "Species", y = "Sepal.Width", add = "jitter")