2
votes

I would like to create with ggplot2 a barplot with SDM from a set of data ($ proteinN in Y and $ method in X) and include in the same barplot (overlapped) with an indicator in the legend another set of data ($ specific) with the shape of a bullet bar chart. Something a little bit like this (but vertical bars and the SDM for the first set of data)


(source: yaksis.com)

Here is my code and data:

    library(ggplot2) 
    data <- textConnection("proteinN, supp, method, specific
    293, protnumb, insol, 46
    259, protnumb, insol, 46
    274, protnumb, insol, 46
    359, protnumb, fasp, 49
    373, protnumb, fasp, 49
    388, protnumb, fasp, 49
    373, protnumb, efasp, 62
    384, protnumb, efasp, 62
    382, protnumb, efasp, 62
    ")

    data <- read.csv(data, h=T)

# create functions to get the lower and upper bounds of the error bars
stderr <- function(x){sqrt(var(x,na.rm=TRUE)/length(na.omit(x)))}
lowsd <- function(x){return(mean(x)-stderr(x))}
highsd <- function(x){return(mean(x)+stderr(x))}

cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", 
               "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

# create a ggplot
ggplot(data=data,aes(x=method, y=proteinN, fill=method))+
  #Change _hue by _manualand remove c=45, l=80 if not desire#
  scale_fill_manual(values=cbPalette)+
  scale_fill_hue(c=45, l=80)+

  # first layer is barplot with means
  stat_summary(fun.y=mean, geom="bar", position="dodge", colour='black')+
  # second layer overlays the error bars using the functions defined above
  stat_summary(fun.y=mean, fun.ymin=lowsd, fun.ymax=highsd, 
              geom="errorbar", position="dodge",color = 'black', size=.5)

I did try few things but nothing work and when I try to add the second set of data I always got this error output:

Error : Mapping a variable to y and also using stat="bin". With stat="bin", it will attempt to set the y value to the count of cases in each group. This can result in unexpected behavior and will not be allowed in a future version of ggplot2. If you want y to represent counts of cases, use stat="bin" and don't map a variable to y. If you want y to represent values in the data, use stat="identity". See ?geom_bar for examples. (Defunct; last used in version 0.9.2)

Error : Mapping a variable to y and also using stat="bin". With stat="bin", it will attempt to set the y value to the count of cases in each group. This can result in unexpected behavior and will not be allowed in a future version of ggplot2. If you want y to represent counts of cases, use stat="bin" and don't map a variable to y. If you want y to represent values in the data, use stat="identity". See ?geom_bar for examples. (Defunct; last used in version 0.9.2)

Here is my try:

# create functions to get the lower and upper bounds of the error bars
stderr <- function(x){sqrt(var(x,na.rm=TRUE)/length(na.omit(x)))}
lowsd <- function(x){return(mean(x)-stderr(x))}
highsd <- function(x){return(mean(x)+stderr(x))}

cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", 
               "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
# create a ggplot
ggplot(data=data,aes(x=method, y=proteinN, fill=method, witdh=1))+
  #Change _hue by _manualand remove c=45, l=80 if not desire#
  scale_fill_manual(values=cbPalette)+
  scale_fill_hue(c=45, l=80)+

  #Second set of data#
  geom_bar(aes(x=method, y=specific, fill="light green"), width=.4) +

  # first layer is barplot with means
  stat_summary(fun.y=mean, geom="bar", position="dodge", colour='black')+

  # second layer overlays the error bars using the functions defined above
  stat_summary(fun.y=mean, fun.ymin=lowsd, fun.ymax=highsd, 
      geom="errorbar", position="dodge",color = 'black', size=.5)
1
What's "SDM"? What do you want to be the outer bar and which the inner? And for that you need two categories but seem to have three methods so i'm not sure how you want to lay them out.MrFlick
...in addition, the very least you can do is read the error message in full. It is quite explicit about at least one of your problems and how to fix it. Have you at least tried to follow it's advice?joran
sorry for that, by SDM i mean standard deviation to the mean i want it only for the data in column (proteinN). I did try to follow the advice from the error message, however i'm not really familiar with this language and i'm not sure how to handle itJérémz

1 Answers

2
votes

Maybe try something like this?

ggplot(data=data,aes(x=method, y=proteinN, fill=method, width=1))+
  scale_fill_hue(c=45, l=80) +
  stat_summary(fun.y=mean, geom="bar", position="dodge", colour='black')+
  stat_summary(fun.y=mean, fun.ymin=lowsd, fun.ymax=highsd, 
               geom="errorbar", position="dodge",color = 'black', size=.5) + 
  geom_bar(data = unique(data[,c('method','specific')]),
           aes(x = method,y = specific),
           stat = "identity",
           fill = "light green",
           width = 0.5)

A couple of notes.

You misspelled "width".

Your two scale_fill lines are pointless. ggplot will only take one fill scale, whichever one appears last. You can't "modify" the fill scale like that. You ought to have received a warning about it that explicitly said:

Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the existing scale.

The error message you got said:

Mapping a variable to y and also using stat="bin"

i.e. you specified y = proteinN while also using stat = "bin" in geom_bar (the default). It went on to explain:

With stat="bin", it will attempt to set the y value to the count of cases in each group.

i.e. rather than plot the values in y, it will try to count the number of instances of, say, insol, and plot that. (Three, in this case.) A cursory examination of the examples in ?geom_bar immediately reveals that most of the examples only specify an x variable. Until you get to this example in the help:

# When the data contains y values in a column, use stat="identity"
library(plyr)
# Calculate the mean mpg for each level of cyl
mm <- ddply(mtcars, "cyl", summarise, mmpg = mean(mpg))
ggplot(mm, aes(x = factor(cyl), y = mmpg)) + geom_bar(stat = "identity")

where it demonstrates that when you specify the precise y values you want, you have to also say stat = "identity". Conveniently, the error message also said this:

If you want y to represent values in the data, use stat="identity".

The final piece was knowing that since the overlaid bars only have one value per x value, we should really collapse that piece down to the minimum information needed via:

unique(data[,c('method','specific')]

or just split it off into it's own data frame ahead of time.