full = data.frame(group = c('a', 'a', 'a', 'a', 'a', 'b', 'c'), values = c(1, 2, 2, 3, 5, 3, 4), year = c(2001, 2002, 2003, 2002, 2003, 2003, 2002))
max = data.frame(group = c('a', 'b', 'c'), year = c(2002, 2003, 2002))
## my attempt:
full = full %>% group_by(group) %>% mutate(mean = mean(values[year != max$year[match(full$group, max$group)]], na.rm = TRUE))
I'm expecting a data.frame where with a new column mean that excludes years that are present in max (with associated group). But this is the output:
group values year mean
1 a 1 2001 2.666667
2 a 2 2002 2.666667
3 a 2 2003 2.666667
4 a 3 2002 2.666667
5 a 5 2003 2.666667
6 b 3 2003 3.000000
7 c 4 2002 NaN
Why is there a mean for b (5th row)? How can I change it for the mean to reflect that filter properly? I imagine it has something with this warning:
Warning message: In year != max$year[match(full$group, max$group)] : longer object length is not a multiple of shorter object length
full$groupis a vector from the full table, not from thegroup_bysubset. Maybe.$groupor justgroupinstead. - Frankgroups? - Rafaelxwithin agroup_bystatement refers to only the subset associated with thatgroup_by.res = full %>% group_by(group) %>% mutate(mean = mean(values[year != max$year[match(first(group), max$group)]], na.rm = TRUE))seems to work. I'm not sure if thefirst()wrapper is necessary. - Frank