I'm trying to create a window function with dplyr, that will return a new vector with the difference between each value and the first of its group. For example, given this dataset:
dummy <- data.frame(userId=rep(1,6),
libId=rep(999,6),
curatorId=c(1:2,1:2,1:2),
iterationNum=c(0,0,1,1,2,2),
rf=c(5,10,0,15,30,40)
)
That creates this dataset:
userId libId curatorId iterationNum rf
1 1 999 1 0 5
2 1 999 2 0 10
3 1 999 1 1 0
4 1 999 2 1 15
5 1 999 1 2 30
6 1 999 2 2 40
And given this grouping:
dummy<-group_by(dummy,libId,userId,curatorId)
Would give this result:
userId libId curatorId iterationNum rf rf.diff
1 1 999 1 0 5 0
2 1 999 2 0 10 0
3 1 999 1 1 0 -5
4 1 999 2 1 15 -5
5 1 999 1 2 30 25
6 1 999 2 2 40 30
So for each group of users, libs and curators, I would get the rf value, minus the rf value with iterationNum=0.
I tried playing with the first
function, the rank
function and others, but couldn't find a way to nail it.
---EDIT---
This is what I tried:
dummy %>%
group_by(userId,libId,curatorId) %>%
mutate(rf.diff = rf - subset(dummy,iterationNum==0)[['rf']])
And:
dummy %>%
group_by(userId,libId,curatorId) %>%
mutate(rf.diff = rf - first(x = rf,order_by=iterationNum))
Which crashes R and returns this error message:
pure virtual method called terminate called after throwing an instance of 'Rcpp::exception' what(): incompatible size (%d), expecting %d (the group size) or 1`
rf - rf[iterationNum == 0]
inside the mutate instead. The other option is to arrange the data usingarrange(iterationNum)
as a separate step in the pipe and the userf - first(rf)
in the mutate if you are sure that each group has a 0 in rf and no lower values. – talatrf - first(rf, iterationNum)
– hadleymutate(rf.diff=rf-first(rf,order_by=iterationNum)
my R session crashed with this message:pure virtual method called
– Omri374