10
votes

while using dplyr i'm having trouble changing the last value my data frame. i want to group by user and tag and change the Time to 0 for the last value / row in the group.

     user_id     tag   Time
1  268096674       1    3
2  268096674       1    10
3  268096674       1    1
4  268096674       1    0
5  268096674       1    9999
6  268096674       2    0
7  268096674       2    9
8  268096674       2    500
9  268096674       3    0
10 268096674       3    1
...

Desired output:

     user_id     tag   Time
1  268096674       1    3
2  268096674       1    10
3  268096674       1    1
4  268096674       1    0
5  268096674       1    0
6  268096674       2    0
7  268096674       2    9
8  268096674       2    0
9  268096674       3    0
10 268096674       3    1
...

I've tried to do something like this, among others and can't figure it out:

df %>%
  group_by(user_id,tag) %>%
  mutate(tail(Time) <- 0)

I tried adding a row number as well, but couldn't quite put it all together. any help would be appreciated.

2

2 Answers

11
votes

Here's an option:

df %>%
  group_by(user_id, tag) %>%
  mutate(Time = c(Time[-n()], 0))
#Source: local data frame [10 x 3]
#Groups: user_id, tag
#
#     user_id tag Time
#1  268096674   1    3
#2  268096674   1   10
#3  268096674   1    1
#4  268096674   1    0
#5  268096674   1    0
#6  268096674   2    0
#7  268096674   2    9
#8  268096674   2    0
#9  268096674   3    0
#10 268096674   3    0

What I did here is: create a vector of the existing column "Time" with all elements except for the last one in the group, which has the index n() and add to that vector a 0 as last element using c() for concatenation.

Note that in my output the Time value in row 10 is also changed to 0 because it is considered the last entry of the group.

6
votes

I would like to offer an alternative approach which will avoid copying the whole column (what both Time[-n()] and replace do) and allow modifying in place

library(data.table)
indx <- setDT(df)[, .I[.N], by = .(user_id, tag)]$V1 # finding the last incidences per group
df[indx, Time := 0L] # modifying in place
df
#       user_id tag Time
#  1: 268096674   1    3
#  2: 268096674   1   10
#  3: 268096674   1    1
#  4: 268096674   1    0
#  5: 268096674   1    0
#  6: 268096674   2    0
#  7: 268096674   2    9
#  8: 268096674   2    0
#  9: 268096674   3    0
# 10: 268096674   3    0