1
votes

I'm trying to create a stacked bar chart displaying percentages of values presented in two columns of a dataframe.

I have two problems with the stacked bar chart my code generates, that I think are linked. enter image description here

  1. ggplot2 will not let me display %, but shows the input values as a share of 1.0. I cannot fix it with scale_y_continuous(labels = percent_format()) which I found when looking around here at SO, so I am at loss on how to solve this?

  2. My error bars are extremely long. Perhaps this is because the SEM is counted on percent but my graph is presenting share out of 1.0. So all values are 1/100 of what they are in my data frame?

My dataframe:

ID Group Labeled Unlabeled
A     0       2        98
B     0       2        98
C     0       4        96
D     0       4        96
E     0       4        96
A     1      50        50
B     1      40        60
C     1      50        50
D     1      40        60
E     1      30        70
A     2      30        70
B     2      30        70
C     2      20        80
D     2      20        80
E     2      20        80
A     3      10        90
B     3      10        90
C     3       5        95
D     3      10        90
E     3       5        95
A     4       2        98
B     4       2        98
C     4       1        99
D     4       1        99
E     4       0       100

My code:

library(ggplot2)
library(plyr)
library(reshape2)


#Calculate means for both groups
melted <- melt(data, id.vars=c("ID", "Group"))
means <- ddply(melted, c("variable", "Group"), summarise,
           mean=mean(value))

#Draw bar plot with ggplot2
plot <- ggplot(data=means, aes(x=Group, y=mean, fill=variable)) + 
  geom_bar(stat="identity",
       position="fill",
       width = 0.4) +                           
  xlab(" ") + ylab("Percentage (%)") + 
  theme_classic(base_size = 16, base_family = "Helvetica") + 
  theme(axis.text.y=element_text(size=16, face="bold")) + 
  theme(axis.title.y=element_text(size=16, face="bold", vjust=1)) + 
  theme(axis.text.x=element_text(angle=45,hjust=1,vjust=1, size=16, face="bold")) +
  theme(legend.position="right")

# Calc SEM  
means.sem <- ddply(melted, c("variable", "Group"), summarise,
               mean=mean(value), sem=sd(value)/sqrt(length(value)))
means.sem <- transform(means.sem, lower=mean-sem, upper=mean+sem)

# Add SEM & change appearance of barplot
plotSEM <- plot + geom_errorbar(data=means.sem, aes(ymax=upper,  ymin=lower), position="fill", width=0.15) 
2

2 Answers

2
votes

This should also work (you just need to adjust the error-bars for the Labeled variable) and default position stack should work.

plot <- ggplot(data=means, aes(x=Group, y=mean, fill=variable)) + 
  geom_bar(stat="identity",
           width = 0.4) +                           
  xlab(" ") + ylab("Percentage (%)") + 
  theme_classic(base_size = 16, base_family = "Helvetica") + 
  theme(axis.text.y=element_text(size=16, face="bold")) + 
  theme(axis.title.y=element_text(size=16, face="bold", vjust=1)) + 
  theme(axis.text.x=element_text(angle=45,hjust=1,vjust=1, size=16, face="bold")) +
  theme(legend.position="right")

# Calc SEM  
means.sem <- ddply(melted, c("variable", "Group"), summarise,
                   mean=mean(value), sem=sd(value)/sqrt(length(value)))
means.sem <- transform(means.sem, lower=mean-sem, upper=mean+sem)
means.sem[means.sem$variable=='Labeled',5:6] <- means.sem[means.sem$variable=='Labeled',3] + means.sem[means.sem$variable=='Unlabeled',5:6]

# Add SEM & change appearance of barplot
plotSEM <- plot + geom_errorbar(data=means.sem, aes(ymax=upper,  ymin=lower), 
                                width=0.15)

enter image description here

1
votes
  1. You usually need the scales package for format_percent() to work, but we'll use a custom function for this
  2. I tweaked you code a bit, but the main differences are:
    1. position = 'stack' on the bars instead of fill
    2. position = 'identity' and stat = 'identity' for the errorbars
    3. Display only one errorbars per group

Data:

df <- read.table(text = "ID Group Labeled Unlabeled
             A     0       2        98
             B     0       2        98
             C     0       4        96
             D     0       4        96
             E     0       4        96
             A     1      50        50
             B     1      40        60
             C     1      50        50
             D     1      40        60
             E     1      30        70
             A     2      30        70
             B     2      30        70
             C     2      20        80
             D     2      20        80
             E     2      20        80
             A     3      10        90
             B     3      10        90
             C     3       5        95
             D     3      10        90
             E     3       5        95
             A     4       2        98
             B     4       2        98
             C     4       1        99
             D     4       1        99
             E     4       0       100", header = T)

Code:

library(ggplot2)
library(dplyr)
library(scales)

df %>% 
  gather('key','value',-ID, -Group) %>% 
  group_by(Group, key) %>% 
  summarise(mean = mean(value),
            sem = sd(value) / sqrt(n()),
            lower = (mean - sem),
            upper = (mean + sem))-> newdf

#Draw bar plot with ggplot2
plot <- ggplot(data=newdf, aes(x=Group, y=mean, fill=key)) + 
  geom_bar(stat="identity",
           position="stack",
           width = 0.4) +
  geom_errorbar(data = filter(newdf, key == 'Unlabeled'), aes(ymax=upper,  ymin=lower), stat = 'identity', position = 'identity', width=0.15) +
  xlab(" ") + 
  ylab("Percentage (%)") +
  scale_y_continuous(labels = function(bs) {paste0(bs, '%')}) +
  theme_classic(base_size = 16, base_family = "Helvetica") + 
  theme(axis.text.y=element_text(size=16, face="bold"), 
        axis.title.y=element_text(size=16, face="bold"),
        axis.text.x=element_text(angle=45,hjust=1,vjust=1, size=16, face="bold"),
        legend.position="right")

Result:

enter image description here