2
votes

Say that I have dataset like this:

   Object       date date_data
 1:      N       <NA>          
 2:      A       <NA>          
 3:      A       <NA>          
 4:      A 2020-01-01 something
 5:      B       <NA>          
 6:      B       <NA>          
 7:      B 2020-01-01 something
 8:      C       <NA>          
 9:      C       <NA>          
10:      C 2020-01-01 something

Generated by

example <- data.table(Object = rep(LETTERS[1:3], each=3), date = as.Date(rep(c(NA,NA,"2020-01-01"),3)), date_data = rep(c("","","something")))
example <- rbind(list(Object="N", date=as.Date(NA), date_data = ""), example)

I know that I can fill in the missing dates for each group using nafill:

example[, date:= nafill(date,"nocb"), by=Object]
    Object       date date_data
 1:      N       <NA>          
 2:      A 2020-01-01          
 3:      A 2020-01-01          
 4:      A 2020-01-01 something
 5:      B 2020-01-01          
 6:      B 2020-01-01          
 7:      B 2020-01-01 something
 8:      C 2020-01-01          
 9:      C 2020-01-01          
10:      C 2020-01-01 something

What I would like to do is to fill in the date_data column in the same way as the date column. I cannot do that with the datatable's nafill option as it doesn't work with character values.

What would be a simple workaround? I have seen similar questions, but didn't find any that refer to the same problem

Edit

@Ronak Shah's data.table works for the original example. But if I change it to

example <- data.table(Object = c(rep("A",6), rep("B",3)), date = as.Date(rep(c(NA,NA,"2020-01-01"),3)), date_data = rep(c("","","something")))
example <- rbind(list(Object="N", date=as.Date(NA), date_data = ""), example)


   Object       date date_data
 1:      N       <NA>          
 2:      A       <NA>          
 3:      A       <NA>          
 4:      A 2020-01-01 something
 5:      A       <NA>          
 6:      A       <NA>          
 7:      A 2020-01-01 something
 8:      B       <NA>          
 9:      B       <NA>          
10:      B 2020-01-01 something

na.fill does not longer work, I get the error message: Supplied 4 items to be assigned to group 2 of size 6 in column 'date_data'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.

2
You can have a look at tidyr::fill(),zoo::na.locf() - Frank Zhang
you can try to convert to factor and then use data.table nafill function - jangorecki

2 Answers

2
votes

You can replace empty values with NA and use zoo::na.locf.

library(data.table)

example[, date_data := zoo::na.locf(replace(date_data, date_data == "", NA)), Object]
example

#   Object       date date_data
# 1:      N       <NA>          
# 2:      A 2020-01-01 something
# 3:      A 2020-01-01 something
# 4:      A 2020-01-01 something
# 5:      B 2020-01-01 something
# 6:      B 2020-01-01 something
# 7:      B 2020-01-01 something
# 8:      C 2020-01-01 something
# 9:      C 2020-01-01 something
#10:      C 2020-01-01 something

and similarly using tidyr's fill :

library(dplyr)

example %>%
  mutate(date_data = replace(date_data, date_data == "", NA)) %>%
  group_by(Object) %>%
  tidyr::fill(date_data, .direction = "up")
1
votes

Another option using data.table applying to all types of columns at once:

#change empty string to NAs if there are more columns, use set
example[date_data=="", date_data := NA_character_]

cols <- c("date", "date_data")
example[, (cols) := lapply(.SD, 
      function(x) x[nafill(replace(seq.int(.N), is.na(x), NA_integer_), "nocb")]), 
  Object, .SDcols=cols]