summarize in dplyr with the maximum value of the date - R

Question

I have the following data,

data

date           ID       value1        value2
2016-04-03     1          0              1
2016-04-10     1          6              2
2016-04-17     1          7              3
2016-04-24     1          2              4
2016-04-03     2          1              5
2016-04-10     2          5              6
2016-04-17     2          9              7
2016-04-24     2          4              8

Now I want to group by ID and find the mean of value2 and latest value of value1. Latest value in the sense, I would like to get the value of latest date i.e. here I would like to get the value1 for corresponding value of 2016-04-24 for each IDs. My output should be like,

ID       max_value1      mean_value2
1             2              2.5
2             4              6.5

The following is the command I am using,

data %>% group_by(ID) %>% summarize(mean_value2 = mean(value2))

But I am not sure how to do the first one. Can anybody help me in getting the latest value of value1 while summarizing in dplyr?

jazzurro jazzurro · Accepted Answer · 2016-08-03T00:22:40

One way would be the following. My assumption here is that date is a date object. You want to arrange the order of date for each ID using arrange. Then, you group the data by ID. In summarize, you can use last() to take the last value1 for each ID.

arrange(data,ID,date) %>%
group_by(ID) %>%
summarize(mean_value2 = mean(value2), max_value1 = last(value1))

#     ID mean_value2 max_value1
#  <int>       <dbl>      <int>
#1     1         2.5          2
#2     2         6.5          4

DATA

data <- structure(list(date = structure(c(16894, 16901, 16908, 16915, 
16894, 16901, 16908, 16915), class = "Date"), ID = c(1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L), value1 = c(0L, 6L, 7L, 2L, 1L, 5L, 9L, 
4L), value2 = 1:8), .Names = c("date", "ID", "value1", "value2"
), row.names = c(NA, -8L), class = "data.frame")

summarize in dplyr with the maximum value of the date - R

3 Answers