2
votes

I am trying to produce a plot in ggplot for multinomial logistic regression. Not all levels of my nominal dependent variable are observed in each factor level. I want a plot that has even width of bars. I can get the mean of each factor to show up using geom_bar with even width bars once I use the position_dodge(preserve='single') code, but I cannot get the geom_point to align the same.

Here is my data and decide is the nominal dependent variable:

decide=c("h", "g", "h", "g", "h", "g", "g", "h", "g", "h", "g", "h", "h", "h", "h", "h", "g", "h", "h", "r", "g", "h", "h", "h", "g", "g", "g", "h", "h", "h","h", "h", "h", "r", "h", "g", "g", "h", "g", "h", "g", "h", "g", "h", "d", "h", "h", "r", "h", "h", "g", "g", "g", "h", "g", "g", "g", "g", "h", "h")
dcsz=c("small",  "medium", "small",  "small",  "medium", "small",  "small",  "medium", "medium", "small",  "small",  "medium", "small",  "medium", "small",  "medium", "small", "medium", "small",  "small",  "medium", "small",  "medium", "medium", "medium", "small",  "small",  "medium", "small",  "medium", "small",  "medium", "small",  "medium", "medium", "medium", "small",  "medium", "medium", "small",  "medium", "small",  "medium", "medium", "small",  "small",  "medium", "small",  "medium", "medium", "medium", "small",  "small",  "small",  "small",  "medium", "medium", "small",  "small",  "medium")
disthome=c(9.2,10.0,5.0,0.8,6.5,2.0,6.8,1.6,6.9,4.4,5.8,6.2,4.7,0.6,3.0,4.7,5.8,1.5,5.8,4.5,3.2,4.6,2.9,4.1,6.5,4.8,9.1,4.7,4.3,4.2,4.8,3.5,5.4,7.1,3.0,5.3,1.0,5.2,2.2,1.7,6.0,6.1,3.1,2.4,4.3,5.1,7.2,9.8,6.9,3.1,8.8,0.9,9.7,2.2,5.4,4.4,6.8,8.3,5.4,2.2)

gohome=data.frame(decide, dcsz, disthome)

Here is how I got the mean and standard error:

gohome.disthome <- gohome %>% 
  group_by(dcsz,decide) %>%
  summarise(meandisthome = mean(na.omit(disthome)), 
            sedisthome=sd(na.omit(disthome))/sqrt(n()))

Now to the nitty gritty: Here is my original code before I managed to align the error bars with the means bar and separated the points into nominal variables:

ggplot(gohome,aes(y=disthome, x=dcsz, fill = decide)) +
  #add bars and the preserve part keeps all bars same width
  geom_bar(stat="identity", position=position_dodge(),
           data=gohome.disthome,aes(x=dcsz,y=meandisthome))
  #overlay data points
  geom_point(position=position_dodge()) +
  #add error bars of means
  geom_errorbar(data=gohome.disthome,stat="Identity",
                position=position_dodge(),
                aes(x=dcsz, fill = decide,y=meandisthome,
                    ymin=meandisthome-sedisthome,ymax=meandisthome+sedisthome),
                    width=0.3)+
  #flip axis
  coord_flip()

enter image description here

Here is the code where I got the error bars to align with the mean bars (using 0.9 in position_dodge), separated the points into nominal variable (0.9), and also got the error bars and mean bars to all be the same width even though the levels of the dependent variable were not all observed in each factor level (I added preserve="single" in position_dodge). I cannot add preserve='single' into the geom_point otherwise it does not separate the points by nominal variable, and using preserve='total' doesn't do anything either:

ggplot(gohome,aes(y=disthome, x=dcsz, fill = decide)) +
  #add bars and the preserve part keeps all bars same width
  geom_bar(stat="identity",position=position_dodge(preserve='single'),
           data=gohome.disthome,aes(x=dcsz,y=meandisthome))+
  #overlay data points
  geom_point(position=position_dodge(0.9)) +
  #add error bars of means
  geom_errorbar(data=gohome.disthome,stat="Identity",
                position=position_dodge(0.9,preserve = "single"),
                aes(x=dcsz, fill = decide,y=meandisthome,
                    ymin=meandisthome-sedisthome,ymax=meandisthome+sedisthome),
                width=0.3)+
  #flip axis
  coord_flip()

enter image description here

I've also tried using position_dodge2 instead of position_dodge for different combos and preserve='total', but that doesn't solve it either. Either the points stay the say or they become a complete scatter with no separation. I had the idea to use position_dodge2 and preserve='total' from the following link since my problem is very similar (not sure why mine isn't working): https://github.com/tidyverse/ggplot2/issues/2712

Can someone please help me fix my code? I need to points to line up perfectly for all error bars.

2
I feel this is a perfect example where one fares better using boxplots instead of bars (or don't use them at all), because the bars do misrepresent the data in this case. you then would't need error bars any more, and only have two geoms to align.tjebo

2 Answers

3
votes

Dodging can be a pain. Given your use case, and assuming you aren't using facets for anything else, it may be simpler to use them instead:

ggplot(gohome, 
       aes(x = decide, y = disthome)) +
  stat_summary(geom = "bar", fun = "mean",
               aes(fill = decide),
               width = 1) +
  geom_point() +
  stat_summary(geom = "errorbar") + # default summary function is mean_se()
  facet_grid(forcats::fct_rev(dcsz) ~ ., switch = "y") +
  coord_flip() +
  
  # optional: aesthetic changes to imitate the original look
  theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.y = element_blank(),
        panel.spacing = unit(0, "pt"), 
        strip.background = element_blank(),
        strip.text.y.left = element_text(angle = 0))

(Note that I didn't use the summary data frame either, as the summary stats in ggplot2 suffice.)

plot

2
votes

The issue is that you missed to set the grouping variable in geom_errobar and geom_point. From the docs:

position_dodge() requires the grouping variable to be be specified in the global or geom_* layer.

Try this:

library(dplyr)
library(ggplot2)

ggplot(gohome,aes(y=disthome, x=dcsz)) +
  #add bars and the preserve part keeps all bars same width
  geom_bar(stat="identity",
           position=position_dodge(), 
           data=gohome.disthome,
           aes(x=dcsz, y=meandisthome, fill = decide)) +
  #overlay data points
  geom_point(aes(group = decide), position=position_dodge(width = 0.9)) +
  #add error bars of means
  geom_errorbar(data=gohome.disthome,stat="Identity",
                position=position_dodge(width = 0.9),
                aes(x=dcsz, 
                    group = decide,
                    y=meandisthome,ymin=meandisthome-sedisthome,ymax=meandisthome+sedisthome), width = 0.5)+
  #flip axis
  coord_flip()

EDIT After a lot of googling and checking out several combos the best solution I can come up with to get bars of the same width is to simply fill up the dataframe using tidyr::complete(decide, dcsz).

gohome <- data.frame(decide,dcsz,disthome) %>% 
  tidyr::complete(decide, dcsz)

gohome.disthome <- gohome %>% group_by(dcsz,decide) %>%
  summarise(meandisthome = mean(na.omit(disthome)), sedisthome=sd(na.omit(disthome))/sqrt(n()))
#> `summarise()` regrouping output by 'dcsz' (override with `.groups` argument)

ggplot(gohome,aes(y=disthome, x=dcsz)) +
  #add bars and the preserve part keeps all bars same width
  geom_bar(stat="identity",
           position=position_dodge(), 
           data=gohome.disthome,
           aes(x=dcsz, y=meandisthome, fill = decide)) +
  #overlay data points
  geom_point(aes(group = decide), position=position_dodge(width = 0.9)) +
  #add error bars of means
  geom_errorbar(data=gohome.disthome,stat="Identity",
                position=position_dodge(width = 0.9),
                aes(x=dcsz, 
                    group = decide,
                    y=meandisthome,ymin=meandisthome-sedisthome,ymax=meandisthome+sedisthome), width = 0.5)+
  #flip axis
  coord_flip()

Created on 2020-06-29 by the reprex package (v0.3.0)