4
votes

I have a melted data frame with your standard id, variable and value columns. variable has 4 levels.

I want to use ggplot to plot a scatter plot using the values in value from each of the factors

to illustrate

data.frame(id= gl(4,1,labels=paste("id",1:4,sep="")), variable=gl(4,4,labels=LETTERS[1:4]),value=rnorm(16))

        id variable        value
1  id1        A -0.494270766
2  id2        A  0.189400188
3  id3        A -0.550961030
4  id4        A -1.046945450
5  id1        B -0.525552660
6  id2        B -0.293601677
7  id3        B  0.009664513
8  id4        B -0.214687215
9  id1        C  1.253551926
10 id2        C -1.241847326
11 id3        C -0.307036508
12 id4        C -0.228632605
13 id1        D -1.683798512
14 id2        D -0.419295267
15 id3        D -0.154469178
16 id4        D -0.763460558

I want to produce ggplot scatter plots for each pair of variable A vs B, A vs C, A vs D, B vs C, and so on, and then ass smoothers to them afterwards.

Cheers, Davy

3
What exactly do you mean by making a plot for each pair of variables? Are you talking about scatterplots? There isn't a clear way to identify which points in A would go along with which points in B and so forth.Dason
Sorry, you're right. A scatterplot. I said line as I plan to add a smoother and forgot. Thank youDavy Kavanagh
Once again though you don't provide any detail for how to match these points. I'm guess the first in A in supposed to go with the first in B. But if that's the case you really should have an id vector or something showing how these points correspond to each other.Dason
It sounds like you want to do something similar to this:stackoverflow.com/questions/3735286/pairs-equivalent-in-ggplot2 I would recommend trying out the ggpairs function in the GGally package.Dason
I probably should, but I just can't bring myself to edit out the "ass smoothers".joran

3 Answers

4
votes

Here's a slightly modified version of plotmatrix in ggplot2 that does this:

dat <- data.frame(id= gl(4,1,labels=paste("id",1:4,sep="")), variable=gl(4,4,labels=LETTERS[1:4]),value=rnorm(16))

require(reshape2)
dat <- dcast(dat,id~variable)

plotmatrix <- function (data, mapping = aes(), colour = "black") 
{
    grid <- expand.grid(x = 1:ncol(data), y = 1:ncol(data))
    grid <- subset(grid, x != y)
    all <- do.call("rbind", lapply(1:nrow(grid), function(i) {
        xcol <- grid[i, "x"]
        ycol <- grid[i, "y"]
        data.frame(xvar = names(data)[ycol], yvar = names(data)[xcol], 
            x = data[, xcol], y = data[, ycol], data)
    }))
    all$xvar <- factor(all$xvar, levels = names(data))
    all$yvar <- factor(all$yvar, levels = names(data))
    densities <- do.call("rbind", lapply(1:ncol(data), function(i) {
        data.frame(xvar = names(data)[i], yvar = names(data)[i], 
            x = data[, i])
    }))
    densities$xvar <- factor(densities$xvar, levels = names(data))
    densities$yvar <- factor(densities$yvar, levels = names(data))
    mapping <- defaults(mapping, aes_string(x = "x", y = "y"))
    class(mapping) <- "uneval"
    ggplot(all, mapping) + 
        facet_grid(xvar ~ yvar, scales = "free") + 
        geom_point(colour = colour, na.rm = TRUE) + 
        stat_density(aes(x = x,y = ..scaled.. * diff(range(x)) + min(x)), 
            data = densities,position = "identity", colour = "grey20", geom = "line") + 
        geom_smooth(se = FALSE,method = "lm",colour = "blue")
}

plotmatrix(dat[,-1])

enter image description here

3
votes

Following @Dason's suggestion to try the GGally package and using @baptise's reshaping code...

    library(ggplot2)
    library(reshape2)
    library(plyr)
    library(GGally)
    #
    n <- 100   # number of observations
    i <- 4     # number of variables, cannot exceed 26 since letters are used as labels
    #
    # create data, following @Davy
    d <- data.frame(id= gl(n, 1, labels, paste("id", 1:n,sep="")), 
                    variable=gl(i, n, labels=LETTERS[1:i]),value=rnorm(n*i))
    #
    # reshape for plotting, from @baptise
    group <- unique(d$variable)
    m <- dcast(d, ...~variable, subset=.(variable %in% group))
    #
    # make scatterplot matrix using GGally package
    # as suggested by @Dason
    ggpairs(m[,2:ncol(m)], 
           lower = list(continuous = "smooth"),
           axisLabels="show")
    # done!

The result is a bit busy with grid lines in the boxes above the diagonal (but no doubt they can turned off) and some other finishing touches are needed before this could go prime-time.

enter image description here

But it's generally true to the ggplot2 approach (the smoother can be removed, if required). The GGally code is available on github.

It's also worth noting that there are examples (including code) of a fantastic variety of scatterplot matrices that can be done in R at Romain François' R Graph Gallery. This one is quite similar to the one above.

3
votes

Try this,

library(ggplot2)
library(reshape2)
library(plyr)

d <- data.frame(id= gl(4,1,labels=paste("id",1:4,sep="")), variable=gl(4,4,labels=LETTERS[1:4]),value=rnorm(16))


plot_pair <- function(pair=c("A","B"), d){
  m <- cast(d, ...~variable, subset=(variable %in% pair))
   ggplot(m, aes_string(x=pair[1], y=pair[2])) +
     geom_point() +
       geom_smooth()

}

pdf("allpairs.pdf")
a_ply(combn(levels(d$variable), 2), 2, plot_pair, d=d, .print=TRUE)
dev.off()