4
votes

I have two dataframes with 3 columns each and each dataframe consists of different data types (df1 has continuous data with column name suffix "con", df2 has categorical data with column name suffix "cat")

My data:

df1 <- data.frame(t1_con=c(1:5), t2_con=c(6:10), t3_con=c(11:15))
df2 <- data.frame(t1_cat=letters[1:5], t2_cat=letters[6:10], t3_cat=letters[11:15]))

I would like to get all combinations of the column names i.e. t1_con, t2_con, t3_cat I have tried this code:

df3 <- cbind(df1, df2)
results <- combn(names(df3),3,simplify=FALSE)
trait_combinations <- melt(results)

This gives me combinations like: t1_con, t2_con, t1_cat which has a duplicate of t1. But, I don't want any duplicates of t1, t2 or t3. E.g. group 1 is good, as there is t1, t2 and t3 within a group, but group 2 has a duplicate of t1:

head(trait_combinations)

value L1
1 t1_con  1
2 t2_con  1
3 t3_con  1
4 t1_con  2
5 t2_con  2
6 t1_cat  2

Is there a way to prevent duplicates from happening in combn, or to post-hoc remove duplicated strings? I could remove the suffixes but I need to know which columns are continuous and categorical for further analysis.

Thanks for your help.

2
Well a lazy and inefficient way would be to generate all the combinations and then use the unique functionJames Curran
Unique doesn't work when I have the suffixes on the column names. But if I take them off it does. I've managed a long-winded way of getting what I want, but ideally there is a quicker way (as this will be repeated 1000s times).LHordley
This is the code: trait_combinations2 <- trait_combinations trait_combinations2$value <- sub("_[^_]+$", "", trait_combinations2$value) ## keep first values trait_combinations2 <- unique(trait_combinations2) trait_combinations2 <- trait_combinations2 %>% group_by(L1) %>% filter(n() >= ncol(trait_temp2)) trait_combinations2 <- trait_combinations2[,-1] trait_combinations3 <- match_df(trait_combinations, trait_combinations2, on = NULL)LHordley
I agree it's inefficient (that's what I said :-)), but I am puzzled why it doesn't work. This works fine for me > unique(c("t1_con", "t1_con", "t1_cat")) [1] "t1_con" "t1_cat"James Curran

2 Answers

1
votes

You can try the

do.call(expand.grid,
        data.frame(rbind(names(df1),names(df2))))

which gives

      X1     X2     X3
1 t1_con t2_con t3_con
2 t1_cat t2_con t3_con
3 t1_con t2_cat t3_con
4 t1_cat t2_cat t3_con
5 t1_con t2_con t3_cat
6 t1_cat t2_con t3_cat
7 t1_con t2_cat t3_cat
8 t1_cat t2_cat t3_cat
1
votes

You can use expand.grid() to generate all 8 combinations.

expand.grid(Map(c, names(df1), names(df2), USE.NAMES = F))

#     Var1   Var2   Var3
# 1 t1_con t2_con t3_con
# 2 t1_cat t2_con t3_con
# 3 t1_con t2_cat t3_con
# 4 t1_cat t2_cat t3_con
# 5 t1_con t2_con t3_cat
# 6 t1_cat t2_con t3_cat
# 7 t1_con t2_cat t3_cat
# 8 t1_cat t2_cat t3_cat

Description

First, use Map to create a list indicating 3 groups of candidate variables:

Map(c, names(df1), names(df2), USE.NAMES = F)

[[1]]
[1] "t1_con" "t1_cat"

[[2]]
[1] "t2_con" "t2_cat"

[[3]]
[1] "t3_con" "t3_cat"

Then, expand.grid() will select one variable from each group, and consequently generate all 8 combinations.