2
votes

I'm trying to write a custom scatterplot matrix function in ggplot2 using facet_grid. My data have two categorical variables and one numeric variable.

I'd like to facet (make the scatterplot rows/cols) according to one of the categorical variables and change the plotting symbol according to the other categorical.

I do so by first constructing a larger dataset that includes all combinations (combs) of the categorical variable from which I'm creating the scatterplot panels.

My questions are:

  • How to use geom_rect to white-out the diagonal and upper panels in facet_grid (I can only make the middle ones black so far)?
  • How can you move the titles of the facets to the bottom and left hand sides respectively?
  • How does one remove tick axes and labels for the top left and bottom right facets?

Thanks in advance.

require(ggplot2)

# Data
nC <- 5
nM <- 4

dat <- data.frame(
    Control = rep(LETTERS[1:nC], nM), 
    measure = rep(letters[1:nM], each = nC), 
    value = runif(nC*nM))

# Change factors to characters
dat <- within(dat, {
    Control <- as.character(Control)
    measure <- as.character(measure)
})
# Check, lapply(dat, class)

# Define scatterplot() function
scatterplotmatrix <- function(data,...){

    controls <- with(data, unique(Control))
    measures <- with(data, unique(measure))
    combs <- expand.grid(1:length(controls), 1:length(measures), 1:length(measures))

    # Add columns for values
    combs$value1 = 1
    combs$value2 = 0

    for ( i in 1:NROW(combs)){
        combs[i, "value1"] <- subset(data, subset = Control==controls[combs[i,1]] & measure == measures[combs[i,2]], select = value)
        combs[i, "value2"] <- subset(data, subset = Control==controls[combs[i,1]] & measure == measures[combs[i,3]], select = value)
    }

    for ( i in 1:NROW(combs)){
        combs[i,"Control"] <- controls[combs[i,1]]
        combs[i,"Measure1"] <- measures[combs[i,2]]
        combs[i,"Measure2"] <- measures[combs[i,3]]
    }

    # Final pairs plot
    plt <- ggplot(combs, aes(x = value1, y = value2, shape = Control)) + 
    geom_point(size = 8, colour = "#F8766D") + 
    facet_grid(Measure2 ~ Measure1) + 
    ylab("") + 
    xlab("") + 
    scale_x_continuous(breaks = c(0,0.5,1), labels = c("0", "0.5", "1"), limits = c(-0.05, 1.05)) + 
    scale_y_continuous(breaks = c(0,0.5,1), labels = c("0", "0.5", "1"), limits = c(-0.05, 1.05)) +
    geom_rect(data = subset(combs, subset = Measure1 == Measure2), colour='white', xmin = -Inf, xmax = Inf,ymin = -Inf,ymax = Inf) 

    return(plt)
}

# Call
plt1 <- scatterplotmatrix(dat)
plt1
1
Have you looked at ggpairs in package GGally? It looks like most of what you want can be done fairly easily with that function, you'd just need to reshape your dataset (like dcast(dat, Control ~ measure) if you use reshape2). - aosmith

1 Answers

3
votes

I'm not aware of a way to move the panel strips (the labels) to the bottom or left. Also, it's not possible to format the individual panels separately (e.g., turn off the tick marks for just one facet). So if you really need these features, you will probably have to use something other than, or in addition to ggplot. You should really look into GGally, although I've never had much success with it.

As far as leaving some of the panels blank, here is a way.

nC <- 5; nM <- 4
set.seed(1)     # for reproducible example
dat <- data.frame(Control = rep(LETTERS[1:nC], nM), 
                  measure = rep(letters[1:nM], each = nC),
                  value = runif(nC*nM))

scatterplotmatrix <- function(data,...){
  require(ggplot2)
  require(data.table)
  require(plyr)      # for .(...)
  DT <- data.table(data,key="Control")
  gg <- DT[DT,allow.cartesian=T]
  setnames(gg,c("Control","H","x","V","y"))
  fmt <- function(x) format(x,nsmall=1)
  plt <- ggplot(gg, aes(x,y,shape = Control)) + 
    geom_point(subset=.(as.numeric(H)<as.numeric(V)),size=5, colour="#F8766D") + 
    facet_grid(V ~ H) + 
    ylab("") + xlab("") + 
    scale_x_continuous(breaks=c(0,0.5,1), labels=fmt, limits=c(-0.05, 1.05)) + 
    scale_y_continuous(breaks=c(0,0.5,1), labels=fmt, limits=c(-0.05, 1.05))   
  return(plt)
}
scatterplotmatrix(dat)

The main feature of this is the use of subset=.(as.numeric(H)<as.numeric(V)) in the call to geom_point(...). This subsets the dataset so you only get a point layer when the condition is met, e.g. in facets where is.numeric(H)<is.numeric(V). This works because I've left the H and V columns as factors and is.numeric(...) operating on a factor returns the levels, not the names.

The rest is just a more compact (and much faster) way of creating what you called comb.