0
votes

I have a dataset like this in R:

Date       | ID  | Age |
2019-11-22 | 1   | 5   |
2018-12-21 | 1   | 4   |
2018-05-09 | 1   | 4   |
2018-05-01 | 2   | 5   |
2017-10-10 | 2   | 4   |
2017-07-21 | 1   | 3   |

How do I change the Age values of each group of ID to the most recent Age record?

Results should look like this:

Date       | ID  | Age |
2019-11-22 | 1   | 5   |
2018-12-21 | 1   | 5   |
2018-05-09 | 1   | 5   |
2018-05-01 | 2   | 5   |
2017-10-10 | 2   | 5   |
2017-07-21 | 1   | 5   |

I tried group_by(ID)%>% mutate(Age = max(Date, Age)) but it seems to be giving strange huge numbers for certain cases when I try it on a v huge dataset. What could be going wrong?

2
just order on your date and replace all values with the first (or last depending how you will sort). Also the most recent for group 2 is 5...Sotos
why is Age 4 for ID 2?Erfan

2 Answers

1
votes

Try sorting first,

df %>% 
 arrange(as.Date(Date)) %>% 
 group_by(ID) %>% 
 mutate(Age = last(Age))

which gives,

# A tibble: 6 x 3
# Groups:   ID [2]
  Date          ID   Age
  <fct>      <int> <int>
1 2017-07-21     1     5
2 2017-10-10     2     5
3 2018-05-01     2     5
4 2018-05-09     1     5
5 2018-12-21     1     5
6 2019-11-22     1     5
0
votes

I think the issue is in your mutate function:

Try this:

    group_by(ID) %>% 
arrange(as.date(Date) %>%
mutate(Age = max(Age))