1
votes

Sample data

dat <- data.table(yr = c(2013,2013,2013,2013,2013,2013,2013,2013,2013,2013,2012,2012,2012,2012,2012,2012,2012,2012,2012,2012,2012),                  
                         location = c("Bh","Bh","Bh","Bh","Bh","Go","Go","Go","Go","Go","Bh","Bh","Bh","Bh","Bh","Bh","Go","Go","Go","Go","Go"),
                          time.period = c("t4","t5","t6","t7","t8","t3","t4","t5","t6","t7","t3","t4","t5","t6","t7","t8","t3","t4","t5","t6","t7"),
                          period = c(20,21,22,23,24,19,20,21,22,23,19,20,21,22,23,24,19,20,21,22,23),
                          value = c(runif(21)))

key <- data.table(time.period = c("t1","t2","t3","t4","t5","t6","t7","t8","t9","t10"),
                           period = c(17,18,19,20,21,22,23,24,25,26))

key provide for each time.period the associated period

In the data table dat, for each location and yr, I want to insert additional rows if a pair of time.period and period is missing

For eg. for location Bh and yr 2013

        dat[location == "Bh" & yr == 2013,]

            yr    location    time.period  period      value
        1: 2013       Bh          t4       20      0.7167561
        2: 2013       Bh          t5       21      0.5659722
        3: 2013       Bh          t6       22      0.8549229
        4: 2013       Bh          t7       23      0.1046213
        5: 2013       Bh          t8       24      0.8144670

I want to do:

            yr    location    time.period  period      value
        1: 2013       Bh          t1        17       0
        1: 2013       Bh          t2        18       0
        1: 2013       Bh          t3        19       0
        1: 2013       Bh          t4        20       0.7167561
        2: 2013       Bh          t5        21       0.5659722
        3: 2013       Bh          t6        22       0.8549229
        4: 2013       Bh          t7        23       0.1046213
        5: 2013       Bh          t8        24       0.8144670
        1: 2013       Bh          t9        25       0
        1: 2013       Bh          t10       26       0

I tried this:

   dat %>% group_by(location,yr) %>% complete(period = seq(17, max(26), 1L))

   A tibble: 40 x 5
   Groups:   location, yr [4]
          location    yr period time.period      value
             <chr>   <dbl>  <dbl>       <chr>      <dbl>
     1       Bh      2012     17        <NA>         NA
     2       Bh      2012     18        <NA>         NA
     3       Bh      2012     19          t3 0.46757583
     4       Bh      2012     20          t4 0.07041745
     5       Bh      2012     21          t5 0.58707367
     6       Bh      2012     22          t6 0.83271673
     7       Bh      2012     23          t7 0.76918731
     8       Bh      2012     24          t8 0.25368225
     9       Bh      2012     25        <NA>         NA
    10       Bh      2012     26        <NA>         NA
    # ... with 30 more rows

As you can see, the time.period is not fill. How do I fill that column as well?

3
Seems somewhat related to Fastest way to add rows for missing values in a data.frame? and Filling missing dates by group, and 'Linked' posts therein.Henrik
Thank you. I have tried those solutions. Still I have one issue which I have edited the question for.89_Simple

3 Answers

3
votes

tidyr::complete can be used to find a solution.

library(dplyr)
library(tidyr)
dat %>% complete(yr, location, key, fill = list(value = 0)) )

# # A tibble: 40 x 5
#    yr  location time.period period  value
#   <dbl> <chr>    <chr>        <dbl> <dbl>
# 1  2012 Bh       t1            17.0 0    
# 2  2012 Bh       t2            18.0 0    
# 3  2012 Bh       t3            19.0 0.177
# 4  2012 Bh       t4            20.0 0.687
# 5  2012 Bh       t5            21.0 0.384
# 6  2012 Bh       t6            22.0 0.770
# 7  2012 Bh       t7            23.0 0.498
# 8  2012 Bh       t8            24.0 0.718
# 9  2012 Bh       t9            25.0 0    
# 10  2012 Bh       t10           26.0 0    
# # ... with 30 more rows

Data

dat <- data.table(yr = c(2013,2013,2013,2013,2013,2013,2013,2013,2013,2013,2012,2012,2012,2012,2012,2012,2012,2012,2012,2012,2012),                  
                         location = c("Bh","Bh","Bh","Bh","Bh","Go","Go","Go","Go","Go","Bh","Bh","Bh","Bh","Bh","Bh","Go","Go","Go","Go","Go"),
                          time.period = c("t4","t5","t6","t7","t8","t3","t4","t5","t6","t7","t3","t4","t5","t6","t7","t8","t3","t4","t5","t6","t7"),
                          period = c(20,21,22,23,24,19,20,21,22,23,19,20,21,22,23,24,19,20,21,22,23),
                          value = c(runif(21)))

key <- data.table(time.period = c("t1","t2","t3","t4","t5","t6","t7","t8","t9","t10"),
                           period = c(17,18,19,20,21,22,23,24,25,26))
2
votes

Since you are using data.table, you can do the following:

dat_new <- dat[,.SD[key, on='time.period'],.(location, yr)]
dat_new[, period := i.period][, i.period := NULL]
dat_new[is.na(value), value := 0]

print(head(dat_new), 10)

    location   yr time.period period     value
 1:       Bh 2013          t1     17 0.0000000
 2:       Bh 2013          t2     18 0.0000000
 3:       Bh 2013          t3     19 0.0000000
 4:       Bh 2013          t4     20 0.9255600
 5:       Bh 2013          t5     21 0.3816035
 6:       Bh 2013          t6     22 0.5202268
 7:       Bh 2013          t7     23 0.5326466
 8:       Bh 2013          t8     24 0.5091590
 9:       Bh 2013          t9     25 0.0000000
10:       Bh 2013         t10     26 0.0000000

Explanation:

1.First, we join the key dataframe with each group of .(location, yr) in dat.
2. This adds the column key dataframe as i.period.
3. Finally, we set NA as 0 and drop i.period column after we set period := i.period.

0
votes

Do you need something like this?

x <- merge(dat, key, by = "time.period", all.y = T)
x[is.na(x)] <- 0