I want to find the lead() and lag() element in each group, but had some wrong results.
For example, data is like this:
library(dplyr)
df = data.frame(name=rep(c('Al','Jen'),3),
score=rep(c(100, 80, 60),2))
df
Data:
name score
1 Al 100
2 Jen 80
3 Al 60
4 Jen 100
5 Al 80
6 Jen 60
Now I try to find out lead() and lag() scores for each person. If I sort it using arrange(), I can get the correct answer:
df %>%
arrange(name) %>%
group_by(name) %>%
mutate(next.score = lead(score),
before.score = lag(score) )
OUTPUT1:
Source: local data frame [6 x 4]
Groups: name
name score next.score before.score
1 Al 100 60 NA
2 Al 60 80 100
3 Al 80 NA 60
4 Jen 80 100 NA
5 Jen 100 60 80
6 Jen 60 NA 100
Without arrange(), the result is wrong:
df %>%
group_by(name) %>%
mutate(next.score = lead(score),
before.score = lag(score) )
OUTPUT2:
Source: local data frame [6 x 4]
Groups: name
name score next.score before.score
1 Al 100 80 NA
2 Jen 80 60 NA
3 Al 60 100 80
4 Jen 100 80 60
5 Al 80 NA 100
6 Jen 60 NA 80
E.g., in 1st line, Al's next.score should be 60 (3rd line).
Anybody know why this happened? Why arrange() affects the result (the values, not just about the order)? Thanks~
1 Al 100 60 NA
with R 3.1.2 on Windows 7 – Panagiotis Kanavos0.4.1.9000
). I think (after a quick, groggy-eyed glance at the source of the series of function calls) it's because the underlying code is going by actual overall row-index instead of the relative row-index. That might explainlead
(I thinkpmin
is the place of the weirdness), but not sure what's going on withlag
(didn't look there). – hrbrmstrdplyr
and was already reported here – alex23lemm