3
votes

Background

After reading this beautiful answer on how to extend ggplot and the corresponding vignette I was trying to understand, how to extend ggplot.

In a nutshell

I understand, how the pieces are put together, but I am missing an important information: how does ggplot determine the default range for the axis?

Code

Consider the following toy example:

library(grid)
library(ggplot2)

GeomFit <- ggproto("GeomFit", GeomBar,
                   required_aes = c("x", "y"),
                   setup_data = .subset2(GeomBar, "setup_data"),
                   draw_panel = function(self, data, panel_scales, coord, width = NULL) {
                     bars <- ggproto_parent(GeomBar, self)$draw_panel(data,
                                                                      panel_scales, 
                                                                      coord)
                     coords <- coord$transform(data, panel_scales)    
                     tg <- textGrob("test", coords$x, coords$y * 2 - coords$ymin)
                     grobTree(bars, tg)
                   }
)

geom_fit <- function(mapping = NULL, data = NULL,
                     stat = "count", position = "stack",
                     ...,
                     width = NULL,
                     binwidth = NULL,
                     na.rm = FALSE,
                     show.legend = NA,
                     inherit.aes = TRUE) {

  layer(
    data = data,
    mapping = mapping,
    stat = stat,
    geom = GeomFit,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      width = width,
      na.rm = na.rm,
      ...
    )
  )
}

set.seed(1234567)
data_gd <- data.frame(x = letters[1:5], 
                      y = 1:5)

p <- ggplot(data = data_gd, aes(x = x, y = y, fill = x)) + 
  geom_fit(stat = "identity")

Which produces this plot: Barplot with text

Problem

As you can see, some text is not shown. I assume that ggplot somehow calculates the ranges for the axis and since it is not aware of the extra space needed for my textGrob. How can I solve that? (Desired outcome is equivalent to p + expand_limits(y = 10)

NB. Of course I could push the problem to the end user, by requiring to add a manual scale. But ideally I would like the scales to be set up properly.

1
ggplot expands the range of the data by multiplicatively and additively. The mult and add factors can set by the user in the expand argument of a scale function. See here for an example. This often comes up when people want the plot to not be expanded at all. In recent versions, you can use expand_scale to expand the axis in one direction only. ?expand_scale is a decent place to start.Gregor Thomas
I don't know how, internally, the initial axis range to be expanded is determined.(Which is why I'm just commenting, not answering.)Gregor Thomas
Based on my understanding, the scale range is trained based on the data range. If you check layer_data(p), the y values range from 1-5, so that's the plot's scale range.Z.Lin
How does ggplot know for which range it has to look? With the same data, I can get quite different ranges based on my mapping: d <- data.frame(x = rep(1:10, 10), y = sample(3, 100, TRUE)): p <- ggplot(d, aes(x = x)); p + geom_bar() vs p + geom_bar(aes(y=y), stat = "identity"). So only after we know which mappings we use in the geom we can determine the range for the plot. -> 1. which function is responsible for determining the range? 2. from which function is his function called?thothal
@thothal That would be the train_position function from Layout. One trick I like to use when digging into ggplot objects is to run debug on ggplot2:::ggplot_build.ggplot or ggplot2:::ggplot_gtable.ggplot_built (the two stages of ggplotGrob()), & check the output in each step to find where the phenomenon of interest (in this case scale creation) happens. Careful though, the rabbit hole can get really, really deep...Z.Lin

1 Answers

2
votes

Eureka

Found an ugly hack thanks to the awesome help of @Z.Lin on how to debug ggplot code. Here's how I came up with this rather ugly hack for future reference:

How I got there

While debugging debug(ggplot2:::ggplot_build.ggplot) I learned that the culprit can be found somewhere in FacetNull$train_scales (in my plot with no facets, other facets like FacetGrid work in a similiar manner). This I learned through debug(environment(FacetNull$train_scales)$f) which in turn I learned in the answer by Z.Lin in another thread.

Once I was able to debug ggproto objects, I could see that in this very function the scales are trained. Basically the function looks at aesthetics which are relevant the specific scale (no clue where this information is set up in the first place, any ideas anybody?) and looks which of these aesthetics are present in the layer data.

I saw that a field ymax_final (which is - according to this table - only used for stat_boxplot) is among the ones which are considered for the setting up the scales. With this piece of information it was easy to find an ugly hack by setting this field in setup_data to the appropriate value.

Code

GeomFit <- ggproto("GeomFit", GeomBar,
                   required_aes = c("x", "y"),
                   setup_data = function(self, data, params) {
                      data <- ggproto_parent(GeomBar, self)$setup_data(data, params)
                      ## here's the hack: add a field which is not needed in this geom, 
                      ## but which is used by Facet*$train_scales which 
                      ## eventually sets up the scales
                      data$ymax_final <- 2 * data$y
                      data
                   }, 
                   draw_panel = function(self, data, panel_scales, coord, width = NULL) {
                     bars <- ggproto_parent(GeomBar, self)$draw_panel(data,
                                                                      panel_scales, 
                                                                      coord)
                     coords <- coord$transform(data, panel_scales)    
                     tg <- textGrob("test", coords$x, coords$y * 2 - coords$ymin)
                     grobTree(bars, tg)
                   }
)

Result

Barplot with labels

Open Question

Where are the (axis) scales set up if you do not define them by hand? I guess there the fields which are relevant for scaling are set up.