1
votes

In geom_text(...), the default dataset is only sometimes subsetted based on facet variables. Easiest to explain with an example.

This example attempts to simulate pairs(...) with ggplot (and yes, I know about lattice, and plotmatrix, and ggpairs – the point is to understand how ggplot works).

require(data.table)
require(reshape2)     # for melt(…)
require(plyr)         # for .(…)
require(ggplot2)

Extract mgp, hp, disp, and wt from mtcars, use cyl as grouping factor

xx <- data.table(mtcars)
xx <- data.table(id=rownames(mtcars),xx[,list(group=cyl, mpg, hp, disp, wt)])

Reshape so we can use ggplot facets.

yy <- melt(xx,id=1:2, variable.name="H", value.name="xval")
yy <- data.table(yy,key="id,group")
ww <- yy[,list(V=H,yval=xval), key="id,group"]
zz <- yy[ww,allow.cartesian=T]

In zz,

H: facet variable for horizontal direction
V: facet variable for vertical direction
xval: x-value for a given facet (given value of H and V)
yval: y-value for a given facet

Now, the following generates something close to pairs(…),

ggp <- ggplot(zz, aes(x=xval, y=yval))
ggp <- ggp + geom_point(subset =.(H!=V), size=3, shape=1)
ggp <- ggp + facet_grid(V~H, scales="free")
ggp <- ggp + labs(x="",y="")
ggp

[]

In other words, the values of xvar and yvar used in geom_point are appropriate for each facet; they have been subsetted based on the value of H and V. However, adding the following to center the variable names in the diagonal facets:

ggp + geom_text(subset = .(H==V),aes(label=factor(H), 
                                     x=min(xval)+0.5*diff(range(xval)),
                                     y=min(yval)+0.5*diff(range(yval))), 
                                 size=10)

gives this:

It appears that H has been subsetted properly for each facet (e.g. the labels are correct), but xvar and yvar seem to apply to the whole dataset zz, not to the subset corresponding to H and V for each facet.

My question is: In the above, why are xvar and yvar treated differently than H in aes? Is there a way around this? {Note: I am much more interested in understanding why this is happening, than in a workaround.]

1

1 Answers

0
votes

One observation is that actually the labels are overplotted:

ggp + geom_text(subset = .(H==V), aes(label=factor(H),
                                     x=min(xval)+0.5*diff(range(xval)) 
                                       + runif(length(xval), max=10),
                                     y=min(yval)+0.5*diff(range(yval))
                                       + runif(length(yval), max=20)), size=10)

adds some noise to the position of the labels, and you can see that for each observation in zz one text is added.

To your original question: From the perspective of ggplot it might be faster to evaluate all aesthetics at once and split later for faceting, which leads to the observed behavior. I'm not sure if doing the evaluation separately for each facet will ever be implemented in ggplot -- the only application I can think of is to aggregate facet-wise, and there are workarounds to achieve this easily. Also, to avoid the overplotting shown above, you'll have to build a table with four observations (one per text) anyway. Makes your code simpler, too.