2
votes

I have been using R/ggplot2 to successfully generate graphs that are very close to those generated by a commercial tool used in my company. But there are a few features that I am unable to implement to my boxplots generated using R.

  1. A summary "table" at the bottom of the graph showing items like, median, count, outliers.
  2. Display the strip text such that duplicate labels are removed.

Sample R code:

library(ggplot2)
library(data.table)
library(reshape2)
library(grid)

# create dataset
dt <- data.table(mpg)

# melt the data table
dtm <- data.table(
  melt(data=dt,
       id.vars=c("manufacturer","model","displ","year","cyl","trans","drv","fl","class"),
       variable.name="mode", value.name="mpg"))
write.csv(dtm,file="dtm.csv",row.names=F)

# draw some plots
p <- ggplot(dtm, aes(x=mode,y=mpg)) +
  geom_boxplot(aes(fill=mode), varwidth=F) +
  facet_grid( ~ manufacturer + year ) +
  theme_bw() +
  theme(panel.margin=unit(0,"mm"), panel.grid=element_blank()) +
  theme(axis.text.x=element_blank(), axis.title.x=element_blank()) +
  theme(legend.position="bottom") +
  coord_cartesian(ylim=c(0,50))
p

ggsave(plot=p, filename='ddtm_r.png', dpi=72, width=16, height=8)

The output generated by R is here,

R/ggplot2 generated boxplot

The same boxplot generated by the commercial tool is here,

boxplot generated by commercial tool

As you can see the data in the graphs are idential (as expected), but the strip headers are better organized in the commercial version, and I can add the summary table at the bottom of the graph as well.

Is there any way in R to do something similar ?

Thanks & regards,

Derric

UPDATED: 7 June 2014

With the suggestions from colleagues and online help, I am able to now plot graphs that contain the boxplot with the summary table appended to the bottom of the boxplot. The main idea was to extract the panel information from the boxplot grob and then use this to generate the text table, and then redraw the plot with the two grobs. The legend of the original boxplot had to be removed in order to get correct right hand side alignment.

The modified R code is as follows,

# load the libraries
library(grid)
library(gridExtra)
library(data.table)
library(reshape2)
library(ggplot2)
library(gtable)
library(plyr)

# create dataset
dt <- data.table(mpg)

# melt the data table
dtm <- data.table(
  melt(data=dt,
        id.vars=c("manufacturer","model","displ","year","cyl","trans","drv","fl","class"),
       variable.name="mode", value.name="mpg"))
#write.csv(dtm,file="dtm.csv",row.names=F)

# draw some plots
p1 <- ggplot(dtm, aes(x=factor(year),y=mpg)) +
  geom_boxplot(aes(fill=factor(year)), varwidth=F) +
  facet_grid( ~ manufacturer + mode ) +
  theme_bw() +
  theme(panel.margin=unit(0,"lines"), 
        panel.grid=element_blank(),
        strip.text=element_text(angle=90),
        axis.text.x=element_blank(), 
        axis.title.x=element_blank(),
        axis.ticks.x=element_blank(),
        plot.margin=unit(c(0,0,0,0),"lines"),
        legend.position="right") +
  coord_cartesian(ylim=c(0,50)) +
  xlab(NULL) 

# deconstruct the plot p1
pb <- ggplot_build(p1)
# pb has three groups; data, panel and plot
pb.data <- pb$data
# pb.data[[1]] is a data.frame
pb.data.df <- pb.data[[1]]
# melt the pb.data.df
pb.data.dt <- data.table(pb.data.df)
#pb.data.dt[,':='(outliers=NULL)]
pb.data.dtm <- melt(data=pb.data.dt,
                    #id.vars=c("x","PANEL"),
                    measure.vars=c("middle","lower","upper"),
                    variable.name="mode",
                    value.name="value")

p2 <- ggplot(pb.data.dtm, aes(x=factor(x),y=factor(mode),label=format(value,nsmall=1))) +
  geom_text(size=3.0, angle=90, hjust=0.5) + facet_grid(~ PANEL) +
  theme_bw() +
  scale_y_discrete() +
  theme(panel.margin=unit(0,"lines"),
    panel.grid=element_blank(),
    panel.border=element_rect(), 
    legend.position="right",
    axis.text.x=element_blank(),
    axis.text.y=element_text(angle=0),
    axis.ticks=element_blank(),
    strip.text=element_blank(),
    strip.background=element_blank(),
    plot.margin=unit(c(0,0,0,0),"lines")
    ) +
  xlab(NULL) + ylab(NULL)

# a function to extract the legend from the grob
g_legend <- function(a.gplot) {
  tmp <- ggplotGrob(a.gplot)
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  legend
}
legend1 <- g_legend(p1)
pa <- arrangeGrob((p1 + theme(legend.position='none')), legend1,
             (p2 + theme(legend.position='none')), 
             ncol=2, nrow=2, 
             heights=c(50/60,10/60), widths=c(95/100,5/100))
ggsave(plot=pa, filename='dtm_r.png',dpi=72,height=10,width=18)

This script now gives me a graph that is shown below,

Boxplot with summary table below it

The issue I have with this graph is that the panels are slightly mis-aligned on the left hand side. Any suggestions on how to align both the left hand side and right hand side of the panels ?

Thanks,

1

1 Answers

0
votes

I think you should only use one facet level, and instead include the year in the grouping argument that is passed to 'fill':

 p <- ggplot(dtm, aes(x=mode,y=mpg)) +
  geom_boxplot(aes(fill=interaction(mode,year) ), varwidth=F) +
  facet_grid( ~ manufacturer ) +
  theme_bw() +
  theme(panel.margin=unit(0,"mm"), panel.grid=element_blank()) +
  theme(axis.text.x=element_blank(), axis.title.x=element_blank()) +
  theme(legend.position="bottom") +
  coord_cartesian(ylim=c(0,50))

 p

The divisions between brands is more distinct. Furthermore, you can unambiguously modify the theme arguments so the brand names are fully visible:

png(width=650) ; p <- p + theme(strip.text.x = element_text(size=8, angle=75)); 
print(p); dev.off()

enter image description here

If you compare these plots, I think it is clear that the one I created conveys information in a manner that is easier to appreciate. The comparison between 1999 and 2008 miles per gallons is much easier for the viewer to see within each 'manufacturer'. The color helps and the facet divisions are only at the "highest level" which allows proper within-group comparisons.