purrr loop: Error: Problem with `mutate()` input `combined_data`. x `x` and `y` must share the same src, set `copy` = TRUE (may be slow)

Question

I tried to create a reproducible example but, frustratingly this actually works:

my_mtcars <- mtcars %>% 
  rownames_to_column('car') %>% 
  group_by(vs) %>% 
  nest

my_mtcars <- my_mtcars %>% 
  mutate(lhs = map(.x = data, ~ .x %>% select(car:drat))) %>% 
  mutate(rhs = map(.x = data, ~ .x %>% select(car, wt:carb) %>% rename(model = car))) %>% 
  mutate(together_again = map2(.x = lhs, .y = rhs, ~ inner_join(.x, .y, by = c('car' = 'model'))))

The above works but shows in a nutshell what I'm trying to do with my real data. My actual data frame which includes list columns fails to mutate with an inner join and I'm hoping that by describing and showing some anonymised data here someone may be able to advise.

My data frame pdata:

data
# A tibble: 104 x 7
   MONETIZATION_WEEK_COHORT data                   cut_off clv_obj          model            prediction       training_period_metrics
   <date>                   <list>                   <int> <list>           <list>           <list>           <list>                 
 1 2020-03-30               <tibble [214,509 × 9]>       7 <named list [2]> <named list [2]> <named list [2]> <tibble [7,328 × 3]>   
 2 2020-03-30               <tibble [214,509 × 9]>       8 <named list [2]> <named list [2]> <named list [2]> <tibble [7,328 × 3]>   
 3 2020-04-06               <tibble [496,626 × 9]>       7 <named list [2]> <named list [2]> <named list [2]> <tibble [20,060 × 3]>  
 4 2020-04-06               <tibble [496,626 × 9]>       8 <named list [2]> <named list [2]> <named list [2]> <tibble [20,060 × 3]>  
 5 2020-04-13               <tibble [595,775 × 9]>       7 <named list [2]> <named list [2]> <named list [2]> <tibble [25,816 × 3]>  
 6 2020-04-13               <tibble [595,775 × 9]>       8 <named list [2]> <named list [2]> <named list [2]> <tibble [25,816 × 3]>  
 7 2020-04-20               <tibble [548,436 × 9]>       7 <named list [2]> <named list [2]> <named list [2]> <tibble [22,161 × 3]>  
 8 2020-04-20               <tibble [548,436 × 9]>       8 <named list [2]> <named list [2]> <named list [2]> <tibble [22,161 × 3]>  
 9 2020-04-27               <tibble [529,507 × 9]>       7 <named list [2]> <named list [2]> <named list [2]> <tibble [21,113 × 3]>  
10 2020-04-27               <tibble [529,507 × 9]>       8 <named list [2]> <named list [2]> <named list [2]> <tibble [21,113 × 3]>

I'm trying to join prediction with training period metrics for each row. Here's what a sample of those two fields look like, they are both data frames:

The .y field in map2 below:

 pdata$prediction[[1]]$result %>% head(2) %>% glimpse
Rows: 2
Columns: 11
$ Id                      <chr> "123abc", "def456"
$ period.first            <date> 2020-05-21, 2020-05-21
$ period.last             <date> 2020-08-26, 2020-08-26
$ period.length           <int> 14, 14
$ actual.x                <int> 0, 0
$ actual.total.spending   <dbl> 0, 0
$ PAlive                  <dbl> 0.72933712, 0.05683547
$ CET                     <dbl> 19.2692978, 0.1285307
$ DERT                    <dbl> 13.37550762, 0.08921192
$ predicted.mean.spending <dbl> 839.648, 1017.683
$ predicted.CLV           <dbl> 11230.71800, 90.78944

The .x field in map2 below:

pdata$training_period_metrics[[1]] %>% head(2) %>% glimpse
Rows: 2
Columns: 3
$ S              <chr> "abc123", "def456"
$ Transactions   <int> 40, 3
$ Total_Spending <dbl> 14660, 1797

I'm trying to join these into a data frame as a new column:

pdata %>% mutate(combined_data = map2(.x = training_period_metrics, .y = prediction, ~ inner_join(.x, .y$result, by = c('S' = 'Id'))))
Error: Problem with `mutate()` input `combined_data`.
x `x` and `y` must share the same src, set `copy` = TRUE (may be slow).
ℹ Input `combined_data` is `map2(...)`.

How can I join prediction$result with training_period_metrics within my purrr loop?

Please check if all the elements in the rhs or lhs have data i.e. if i do my_mtcars$rhs[[2]] <- NULL; my_mtcars %>% mutate(together_again = map2(.x = lhs, .y = rhs, ~ inner_join(.x, .y, by = c('car' = 'model'))))# Error: Problem with mutate()` input together_again. ✖ x and y must share the same src, set copy = TRUE (may be slow).` — akrun
If you correct for those elements by skipping them, it would be work. It is not clear what kind of conditions you want for those cases — akrun
In the case of a NULL, I'd like to make the new df NULL or NA (I don't understand which is best here) else I would like to do the join — user14328853

akrun akrun · Accepted Answer · 2021-03-24T19:45:06

We can use a condition to do the join only if both .x and .y are not NULL or else return NULL

my_mtcars %>%
    mutate(together_again = map2(.x = lhs, .y = rhs,
  ~ if(is.null(unlist(.x))|is.null(unlist(.y))) list(NULL) else
        inner_join(.x, .y, by = c('car' = 'model'))))

purrr loop: Error: Problem with `mutate()` input `combined_data`. x `x` and `y` must share the same src, set `copy` = TRUE (may be slow)

1 Answers