2
votes

I'm running into unexpected behavior with dplyr:

library(dplyr)

df <- structure(list(date = c("2016-05-02", "2016-05-02", "2016-05-02", 
      "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", 
      "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", 
      "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", 
      "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", 
      "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", 
      "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", 
      "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", 
      "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", 
      "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", "2016-05-02", 
      "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", 
      "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", 
      "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", 
      "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", 
      "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", 
      "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", 
      "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", 
      "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", 
      "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-03", 
      "2016-05-03", "2016-05-03", "2016-05-03", "2016-05-04", "2016-05-04", 
      "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", 
      "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", 
      "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", 
      "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", 
      "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", 
      "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", 
      "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", 
      "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", 
      "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", "2016-05-04", 
      "2016-05-04", "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", 
      "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", 
      "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", 
      "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", 
      "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", 
      "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", 
      "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", 
      "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", 
      "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", 
      "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-05", "2016-05-06", 
      "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", 
      "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", 
      "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", 
      "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", 
      "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", 
      "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", 
      "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", 
      "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", 
      "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", "2016-05-06", 
      "2016-05-06", "2016-05-06"), abc = c(NA, NA, NA, NA, NA, NA, 
         NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
         NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
         NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 20, 20, 16, 
         14, 9, 8, 6, 5, 5, 6, 7, 13, 24, 52, 65, 68, 66, 65, 58, 47, 
         21, 6, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
         1, 1, 0, 0, 0, 0, 0, 10, 19, 19, 15, 11, 8, 8, 5, 4, 4, 4, 5, 
         9, 17, 31, 43, 49, 52, 52, 47, 32, 21, 6, 2, 1, 1, 1, 1, 1, 1, 
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 5, 14, 
         14, 14, 15, 18, 18, 14, 14, 14, 15, 19, 29, 46, 58, 62, 69, 71, 
         67, 56, 40, 25, 8, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
         2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 10, 18, 18, 14, 12, 9, 7, 5, 
         4, 5, 5, 7, 9, 17, 30, 36, 49, 52, 54, 54, 42, 32, 15, 5, 1)), 
     class = "data.frame", row.names = c(NA, -240L), .Names = c("date", "abc"))


df %>%
  group_by(date) %>%
  mutate(peak_max_index = as.numeric(which.max(as.numeric(abc))))

What I would expect this to return is peak_max_index that's 41 for all rows where date is 2016-05-04. But strangely peak_max_index is NA instead. Even more strangely, if you kick out all rows where date is 2016-05-03 before you run the dplyr commands, the result is entirely correct. Is this a bug?

1
Did you try df %>% group_by(date) %>% mutate(peak_max_index = as.numeric(which.max(as.numeric(abc)))) %>% filter(date == '2016-05-04')? This is showing that the first part is doing the right thing. What is packageVersion('dplyr') show?Gopala
That gives me the same result. Package version is 0.4.3RoyalTS
So, what part of the result is weird? I saved the result of your command into df and subset this way df[df$date == '2016-05-04', ] and still get 41 for all rows.Gopala
BTW - there are some bugs in dplyr 0.4.3 (unrelated to this issue) and so I use dev version 0.4.3.9001.Gopala
For the same subset, I get `NA' for all rows. May have to try the dev version.RoyalTS

1 Answers

-1
votes

You are evaluating NA's in your formula which.max(). Just eliminate the NA's with !is.na().

df %>%
    group_by(date) %>%
    mutate(peak_max_index = max(df$abc[!is.na(df$abc)]))