0
votes

I am trying to add_row() to grouped data without using do.

library(dplyr)
library(tidyr)
library(purrr)
library(tibble)


my.data <- data.frame(

  supplier = c("a","a","a","a","a","a","b","b","b","b","b","b"),
  date = rep(c("2017-06-01","2017-03-01","2017-02-01","2017-01-12",
               "2017-05-01","2017-04-01"), 2), 
  order = c(1,0,0,1,1,0,0,1,0,0,1,0)

)

Solution with do

my.data %>%
  group_by(supplier) %>% 
  do(add_row(.,.before=0))

which gives

# A tibble: 14 x 3
# Groups:   supplier [3]
   supplier       date order
      <chr>      <chr> <dbl>
 1     <NA>       <NA>    NA
 2        a 2017-06-01     1
 3        a 2017-03-01     0
 4        a 2017-02-01     0
 5        a 2017-01-12     1
 6        a 2017-05-01     1
 7        a 2017-04-01     0
 8     <NA>       <NA>    NA
 9        b 2017-06-01     0
10        b 2017-03-01     1
11        b 2017-02-01     0
12        b 2017-01-12     0
13        b 2017-05-01     1
14        b 2017-04-01     0

Attempt with nest and mutate or purrr::map

my.data %>%
  group_by(supplier) %>%
  nest() %>%
  mutate(extra.row = add_row(data, .before = 0))

Error in mutate_impl(.data, dots) : Evaluation error: Unsupported index type: NULL.

Any suggestions. do is very slow when scaled.

1
You want the entire row to be NA, even the grouping var(s)? - Frank
The intent is to add_row with date value that is 30 days before the minimum date per group. add_row(data, date = (min(.$date) - 30), .before=0) - iboboboru
Ok, a join should do it, I guess. I don't use the tidyverse so can't make an answer. You should probably clarify that the "solution with do" in the question isn't a solution (since supplier and date are NA)..? - Frank

1 Answers

3
votes

You could bind a summarized dataset onto the original using bind_rows.

You may also be able to use complete, although right now your dates per group are the same and it might not work as written for different dates per group. Also, I believe complete tends to be slow when you scale up.

Both solutions hinge on date being an actual date variable in the original dataset.

my.data = mutate(my.data, date = as.Date(date) )

Summarizing and binding with summarize and bind_rows. The arrange is to get things in order, and could very well not be needed in the real case.

my.data %>%
    group_by(supplier) %>%
    summarize(date = min(date) - 30) %>%
    bind_rows(., my.data) %>%  
    arrange(supplier, date)

Using complete if dates are the same among groups.

my.data %>%
    group_by(supplier) %>%
    complete(date = c(min(.$date) - 30, .$date ) )

Result for both:

# A tibble: 14 x 3
# Groups:   supplier [2]
   supplier       date order
     <fctr>     <date> <dbl>
 1        a 2016-12-13    NA
 2        a 2017-01-12     1
 3        a 2017-02-01     0
 4        a 2017-03-01     0
 5        a 2017-04-01     0
 6        a 2017-05-01     1
 7        a 2017-06-01     1
 8        b 2016-12-13    NA
 9        b 2017-01-12     0
10        b 2017-02-01     0
11        b 2017-03-01     1
12        b 2017-04-01     0
13        b 2017-05-01     1
14        b 2017-06-01     0