
Say I have five groups which are not disjoint (i.e. they are overlapping). I would like to make a scatter plot of Var1 vs. Var2 for each of the classes.

More specifically, consider a data frame which has two columns Var1 and Var2, concatenated with five columns taking values 0 and 1, signifying each row's membership to each of the five classes. If these classes were disjoint, I would simply use facet grid on a variable taking values 1 to 5, and problem solved. But because they are overlapping, I'm not sure how to make such a plot.

Thank you for the help!

Why not include a reproducible example with sample input data to make it more clear what exactly you have and what you want. This isn't a place for general plotting advice. Make sure you ask a specific programming question.MrFlick

1 Answers


This is easy using the tidyr package, in particular the gather() function from that package.

First I create a data frame that I think has the properties you want. Note that I use dplyr and it's awesome pipe syntax (that's the %>% stuff below).

# packages we need

# an example data frame
df <- 
    data.frame(var1 = rnorm(30), 
               var2 = rnorm(30), 
               A = sample(c(TRUE, FALSE), 30, replace = T),
               B = sample(c(TRUE, FALSE), 30, replace = T),
               C = sample(c(TRUE, FALSE), 30, replace = T),
               D = sample(c(TRUE, FALSE), 30, replace = T),
               E = sample(c(TRUE, FALSE), 30, replace = T)

The key step is rehape the data frame using tidyr::gather() so that each data point (var1, var2) is replicated five-fold, i.e. once for each column that is gathered. In addition to replicating the data in the columns not gathered, the gather() function also creates two new columns. The first of these I call class and will have values of either A, B, C, D, or E. The second I call is_in and has a value of either TRUE or FALSE, depending on whether the corresponding data point is in the class referred to by the class column.

# reshape the data frame using dplyr 
df.reshaped <- 
    df %>% 
        mutate(index = row_number()) %>%  # number the data points
        gather(class, is_in, A:E) %>%     # repeat all (var1, var2) points 5x
        filter(is_in == TRUE) %>%         # keep only points you want
        select(-is_in)                    # the is_in column is now superfluous

The data is now ready for plotting. Just to verify that our plot will show the same original data point in multiple facets, I put in a mutate() call above to number all the original (i.e. before gathering) data points by row number. I'll plot using geom_text() and thus if we see the same number in different facets, the objective is achieved.

# plot the graph
df.reshaped %>% 
    ggplot(aes(x = var1, y = var2, label = index)) +
        geom_text() +  
        facet_grid(.~class) +

ggsave('SO_39820087.png', width = 10, height = 4)

The resulting plot looks like this on my machine.

Successful facet plot of non-disjoint classes