6
votes

I want to do a very simple histogram with ggplot2. I have the following MWE:

library(ggplot2)
mydf <- data.frame(
                  Gene=c("APC","FAT4","XIRP2","TP53","CSMD3","BAI3","LRRK2","MACF1",
                  "TRIO","SETD2","AKAP9","CENPF","ERBB4","FBXW7","NF1","PDE4DIP",
                  "PTPRT","SPEN","ATM","FAT1","SDK1","SMG1","GLI3","HIF1A","ROS1",
                  "BRDT","CDH11","CNTRL","EP400","FN1","GNAS","LAMA1","PIK3CA",
                  "POLE","PRDM16","ROCK2","TRRAP","BRCA2","DCLK1","EVC2","LIFR",
                  "MAST4","NAV3"),
                  Freq=c(48,39,35,28,26,17,17,17,16,15,14,14,14,14,14,14,14,14,13,
                  13,13,13,12,12,12,11,11,11,11,11,11,11,11,11,11,11,11,10,10,10,
                  10,10,10))
mydf
ggplot(mydf, aes(x=Gene)) +
      geom_histogram(aes(y=Freq),
      stat="identity",
      binwidth=.5, alpha=.5,
      position="identity")

I have always used this simple code to produce this kind of histograms.

In fact, I have the plot for this particular example that I made some time ago...

enter image description here

However, now I run this exact same code, and I get the following error:

Error: Unknown parameters: binwidth, bins, pad

Why do I find this error now and not before, and what does it mean?

Thanks a lot!

1
Has your input data changed since the original plot?Tim Biegeleisen
No change, I actually copied it from my old code for this MWEDaniCee
Have they introduced changes to ggplot2_ What would be the correct way to reproduce that plot with that data now?DaniCee
So I guess now the correct way to do this would be ggplot(mydf, aes(Gene, Freq)) + geom_bar(aes(y=Freq), stat="identity", position="identity")DaniCee

1 Answers

4
votes

geom_histogram() is no longer the most appropriate way to plot counts of discrete values.

As you've pre-calculated your frequency values use geom_col() instead, then all the errors will disappear.

library(ggplot2)
mydf <- data.frame(
               Gene=c("APC","FAT4","XIRP2","TP53","CSMD3","BAI3","LRRK2","MACF1",
              "TRIO","SETD2","AKAP9","CENPF","ERBB4","FBXW7","NF1","PDE4DIP",
              "PTPRT","SPEN","ATM","FAT1","SDK1","SMG1","GLI3","HIF1A","ROS1",
              "BRDT","CDH11","CNTRL","EP400","FN1","GNAS","LAMA1","PIK3CA",
              "POLE","PRDM16","ROCK2","TRRAP","BRCA2","DCLK1","EVC2","LIFR",
              "MAST4","NAV3"),
              Freq=c(48,39,35,28,26,17,17,17,16,15,14,14,14,14,14,14,14,14,13,
              13,13,13,12,12,12,11,11,11,11,11,11,11,11,11,11,11,11,10,10,10,
              10,10,10), stringsAsFactors = FALSE)
mydf
ggplot(mydf, aes(x=Gene, y=Freq)) + 
   geom_col() + 
   scale_x_discrete(limits = mydf$Gene)

NB: also need to define your Gene column as not a factor and scale_x_discrete() to avoid alphabetical ordering of the x-axis.