0
votes

I have the following dataframe structured in terms of two variables, i.e Country and Product, within each single group. The thing I would like to achieve is to to find all combinations per group.

# Group  Country  Product
# 1     A Country1 Product1
# 2     A Country2 Product2
# 3     B Country3 Product3
# 4     B Country4 Product4

structure(list(Group = c("A", "A", "B", "B"), Country = c("Country1", 
"Country2", "Country3", "Country4"), Product = c("Product1", 
"Product2", "Product3", "Product4")), class = "data.frame", row.names = c(NA, 
-4L)) -> df

This is the approach for one group:

df %>% 
  dplyr::filter(Group == 'A') -> df_A

tidyr::crossing(Country = unique(df_A$Country),
         Product = unique(df_A$Product)) %>% #or expand.grid
  dplyr::mutate(Group = 'A') 

I can make a function to iterate the same approach for each group and append the results into a single dataframe. However, for the sake of efficiency, I was wondering if there are any other options...

The expected output for each group would be:

# > output
# Group  Country  Product
# 1     A Country1 Product1
# 2     A Country1 Product2
# 3     A Country2 Product1
# 4     A Country2 Product2
# 5     B Country3 Product3
# 6     B Country3 Product4
# 7     B Country4 Product3
# 8     B Country4 Product4

structure(list(Group = c("A", "A", "A", "A", "B", "B", "B", "B"
), Country = c("Country1", "Country1", "Country2", "Country2", 
"Country3", "Country3", "Country4", "Country4"), Product = c("Product1", 
"Product2", "Product1", "Product2", "Product3", "Product4", "Product3", 
"Product4")), class = "data.frame", row.names = c(NA, -8L)) -> output

Thank you.

EDIT:

Got the following solution:

df %>% 
  group_by(Group) %>%
  group_split() -> x

purrr::map_dfr(1:length(x), 

function(i) {crossing(Group = unique(x[[i]]$Group),
                      Country = unique(x[[i]]$Country),
                      Product = unique(x[[i]]$Product))
}
)
4

4 Answers

1
votes

I think you are looking at cross join:

library(data.table)
setDT(df)
df <- df[, CJ(.SD$Country, .SD$Product), .(Group)]

print(df)

  Group       V1       V2
1:     A Country1 Product1
2:     A Country1 Product2
3:     A Country2 Product1
4:     A Country2 Product2
5:     B Country3 Product3
6:     B Country3 Product4
7:     B Country4 Product3
8:     B Country4 Product4

For tidyverse, it's not that straightforward, you can do:

f <- df %>% 
  group_by(Group) %>% 
  bind_rows(data.table::CJ(df$Group, df$Country, df$Product))


f <- data.frame(t(apply(f, 1, function(x) x[!is.na(x)])))
colnames(f) <- colnames(df)
0
votes

Here is a base R solution suing expand.grid

dfout <- do.call(rbind,
                 c(lapply(split(df,df$Group), 
                          function(v) setNames(with(v,cbind(unique(Group),expand.grid(Country,Product))),names(v))),
                   make.row.names = FALSE)
                 )

such that

> dfout
  Group  Country  Product
1     A Country1 Product1
2     A Country2 Product1
3     A Country1 Product2
4     A Country2 Product2
5     B Country3 Product3
6     B Country4 Product3
7     B Country3 Product4
8     B Country4 Product4
0
votes

Got this solution, maybe someone can benefit from it.

df %>% 
  group_by(Group) %>%
  group_split() -> x

purrr::map_dfr(1:length(x), 

function(i) {crossing(Group = unique(x[[i]]$Group),
                      Country = unique(x[[i]]$Country),
                      Product = unique(x[[i]]$Product))
}
)
0
votes

I think this is a use-case for complete instead of crossing :

library(dplyr)
df %>% group_by(Group) %>% tidyr::complete(Country, Product)

#  Group Country  Product 
#  <chr> <chr>    <chr>   
#1 A     Country1 Product1
#2 A     Country1 Product2
#3 A     Country2 Product1
#4 A     Country2 Product2
#5 B     Country3 Product3
#6 B     Country3 Product4
#7 B     Country4 Product3
#8 B     Country4 Product4