0
votes

I am trying to take a column of data that has a mix of values and NAs, and replace values with the minimum values in a group. The challenge is I haven't figured out a way to use group_by in this case because there is not a unique grouping in the data set.

What I am looking to do is to say: if the value in the value column is a number, then use that number, unless the previous value is less than the current value. If it is not a number, then leave the value as NA.

I tried group_by but realized that wouldn't work as described above. Then I tried an if_else, but I think this fails because is.numeric is not vectorized.

The final data frame is what I am trying to achieve.

Example data

library(dplyr)

# Initial
initial <- structure(list(dates = structure(c(17532, 17539, 17546, 17553, 
                                             17560, 17567, 17574, 17581, 17588, 17595, 17602, 17609, 17616, 
                                             17623, 17630, 17637, 17644, 17651, 17658, 17665, 17672, 17679
), class = "Date"), values = c(10, 10, 10, 11, NA, NA, NA, NA, 
                               NA, 20, 20, 21, 22, NA, NA, NA, NA, NA, 30, 30, 31, NA)), class = "data.frame", row.names = c(NA, 
                                                                                                                             -22L))
# Final
final <- structure(list(dates = structure(c(17532, 17539, 17546, 17553, 
                                              17560, 17567, 17574, 17581, 17588, 17595, 17602, 17609, 17616, 
                                              17623, 17630, 17637, 17644, 17651, 17658, 17665, 17672, 17679
), class = "Date"), values = c(10, 10, 10, 11, NA, NA, NA, NA, 
                               NA, 20, 20, 21, 22, NA, NA, NA, NA, NA, 30, 30, 31, NA), desired = c(10, 
                                                                                                    10, 10, 10, NA, NA, NA, NA, NA, 20, 20, 20, 20, NA, NA, NA, NA, 
                                                                                                    NA, 30, 30, 30, NA)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                              -22L))

This, and other attempts, did not work. I suspect because is.numeric is not vectorized. I tried some mutate_at versions as well but could not get that to work either.

# Did not work
initial %>%
  mutate(desired = ifelse(is.numeric(values), ifelse(is.numeric(lag(values)), lag(values), values), values))
1

1 Answers

0
votes

We can do use data.table::rleid for creating groups and select min value from each.

library(dplyr)

initial %>%
  group_by(group = data.table::rleid(is.na(values))) %>%
  mutate(ans = min(values)) %>% 
  ungroup() %>%
  select(-group)

# A tibble: 22 x 3
#   dates      values   ans
#   <date>      <dbl> <dbl>
# 1 2018-01-01     10    10
# 2 2018-01-08     10    10
# 3 2018-01-15     10    10
# 4 2018-01-22     11    10
# 5 2018-01-29     NA    NA
# 6 2018-02-05     NA    NA
# 7 2018-02-12     NA    NA
# 8 2018-02-19     NA    NA
# 9 2018-02-26     NA    NA
#10 2018-03-05     20    20
# … with 12 more rows

For a purely dplyr solution, we can replace group_by statement with

group_by(group = cumsum(is.na(values) != lag(is.na(values), default = FALSE))) %>%