I have the following problem:
When using dplyr to mutate a numeric column after group_by(), it fails if a row contains only one value which is an NaN when using the mutate command.
Thus, if the grouped column contains a numeric, it correctly classifies as dbl, but as soon as there is an instance of only a NaN for a group, it fails as dplyr defines that group as lgl, while all the other groups are dbl.
My first (and more general question) is: Is there a way to tell dplyr, when using group_by(), to always define a column in a certain way?
Secondly, can someone help me with a hack for the problem explained in the MWE below:
# ERROR: This will provide the column defining error mentioned:
df <- data_frame(a = c(rep(LETTERS[1:2],4),"C"),g = c(rep(LETTERS[5:7],3)), x = c(7, 8,3, 5, 9, 2, 4, 7,8)) %>% tbl_df()
df <- df %>% group_by(a) %>% mutate_each(funs(sd(., na.rm=TRUE)),x)
df <- df %>% mutate(Winsorise = ifelse(x>2,2,x))
# NO ERROR (as no groups have single entry with NaN):
df2 <- data_frame(a = c(rep(LETTERS[1:2],4),"C"),g = c(rep(LETTERS[5:7],3)), x = c(7, 8,3, 5, 9, 2, 4, 7,8)) %>% tbl_df()
df2 <- df2 %>% group_by(a) %>% mutate_each(funs(sd(., na.rm=TRUE)),x)
# Update the Group for the row with an NA - Works
df2[9,1] <- "A"
df2 <- df2 %>% mutate(Winsorise = ifelse(x>3,3,x))
# REASON FOR ERROR: What happens for groups with one member = NaN, although we want the winsorise column to be dbl not lgl:
df3 <- data_frame(g = "A",x = NaN)
df3 <- df3 %>% mutate(Winsorise = ifelse(x>3,3,x))