18
votes

I have a data frame which contains x-axis numeric bins and continuous y-axis data across multiple categories. Initially, I created a boxplot by making the x-axis bins "factors", and doing a boxplot of the melted data. Reproducible data:

x <- seq(1,10,by=1)
y1 <- rnorm(10, mean=3)
y2 <- rnorm(10, mean=10)
y3<- rnorm(10, mean=1)
y4<- rnorm(10, mean=8)
y5<- rnorm(10, mean=12)
df <- data.frame(x,y1,y2,y3,y4,y5)
df.m <- melt(df, id="x")

My code to create the x-axis data as a factor:

df.m$x <- as.factor(df.m$x)

My ggplot:

ggplot(df.m, aes(x=x, y=value))+
 geom_boxplot(notch=FALSE, outlier.shape=NA, fill="red", alpha=0.1)+
 theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

The resulting plot: The resulting plot:

The problem is that I cannot use x-axis numeric spacing because the x-axis is categorized as a factor, which has equal spacing. I want to be able to use something like scale_x_continuous to manipulate the axis breaks and spacing to, say, an interval of 2, rather than a boxplot every 1, but when I try to plot the data with the x-axis "as.numeric", I just get one boxplot of all of the data:

plot

Any suggestions for a way to get this continuous-looking boxplot curve (the first image) while still being able to control the numeric properties of the x-axis? Thanks!

2
@Henrik Doesn't the Google drive link in the question work for you? That should be the data frame for this example. Thanks for showing up and helping me again! It's been a problem-ridden day in the R world...AndMan21
@Henrik Gotcha, sorry about that. Working on the edit nowAndMan21

2 Answers

26
votes

Here is a way using the original data you posted on Google - which actually was much more helpful, IMO.

ggplot(df, aes(x=CH, y=value,group=CH))+
  geom_boxplot(notch=FALSE, outlier.shape=NA, fill="red", alpha=0.2)+
  scale_x_log10()

So, as @BenBolker said before he deleted his answer(??), you should leave the x-variable (CH) as numeric, and set group=CH in the call to aes(...).

With your real data there is another problem though. Your CH is more or less logarithmically spaced, so there are about as many points < 1 as there are between 1 - 10, etc. ggplot wants to make the boxes all the same size, so with a linear x-axis the box width is smaller than the line width, and you don't see the boxes at all. Changing the x-axis to a logarithmic scale fixes that, more or less.

0
votes

Don't make x a factor. You need to aesthetically map a group that is a factor determining which box the value is associated with, luckily, after melting, this is what you variable column is:

ggplot(df.m, aes(x = x, y = value, group = variable)) +
    geom_boxplot()

As x is still numeric, you can give it whatever values you want within a specific variable level and the boxplot will show up at that spot. Or you could transform the x axis, etc.