R - row subsets of data frame based on all combinations of factor levels

Question

I'm trying to figure out the simplest way to do the following. I have a data frame df with colnames(df) <- c("A", "B", "C", "D", "E") where all the variables are encoded as factors. Given some arbitrary subset of variables, I want to generate all possible subsets of rows of df that can be generated based on all the possible combinations of factor levels of these variables.

So basically, I'm looking for a function allSubsets that takes a vector of column names as arguments (let's say c("A", "E") and returns a list of data frames. Let's say levels(df$A) are a1, a2 and levels(df$E) are e1, e2, e3, then I want the function to generate a list of data frames (length where elements correspond to :

df[df$A == 'a1' & df$E == 'e1',]

df[df$A == 'a2' & df$E == 'e1',]

df[df$A == 'a1' & df$E == 'e2',]

df[df$A == 'a2' & df$E == 'e2',]

df[df$A == 'a1' & df$E == 'e3',]

df[df$A == 'a2' & df$E == 'e3',]

I know of expand.grid but I'm not sure if that's the best way of doing this.

akrun akrun · Accepted Answer · 2019-10-08T18:13:41

We can use split to get a listof data.frames

lst1 <- split(df, df[c("A", "E")], drop = TRUE)

R - row subsets of data frame based on all combinations of factor levels

1 Answers