4
votes

I have the following data,

data

date           ID       value1        value2
2016-04-03     1          0              1
2016-04-10     1          6              2
2016-04-17     1          7              3
2016-04-24     1          2              4
2016-04-03     2          1              5
2016-04-10     2          5              6
2016-04-17     2          9              7
2016-04-24     2          4              8

Now I want to group by ID and find the mean of value2 and latest value of value1. Latest value in the sense, I would like to get the value of latest date i.e. here I would like to get the value1 for corresponding value of 2016-04-24 for each IDs. My output should be like,

ID       max_value1      mean_value2
1             2              2.5
2             4              6.5 

The following is the command I am using,

data %>% group_by(ID) %>% summarize(mean_value2 = mean(value2))

But I am not sure how to do the first one. Can anybody help me in getting the latest value of value1 while summarizing in dplyr?

3

3 Answers

5
votes

One way would be the following. My assumption here is that date is a date object. You want to arrange the order of date for each ID using arrange. Then, you group the data by ID. In summarize, you can use last() to take the last value1 for each ID.

arrange(data,ID,date) %>%
group_by(ID) %>%
summarize(mean_value2 = mean(value2), max_value1 = last(value1))

#     ID mean_value2 max_value1
#  <int>       <dbl>      <int>
#1     1         2.5          2
#2     2         6.5          4

DATA

data <- structure(list(date = structure(c(16894, 16901, 16908, 16915, 
16894, 16901, 16908, 16915), class = "Date"), ID = c(1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L), value1 = c(0L, 6L, 7L, 2L, 1L, 5L, 9L, 
4L), value2 = 1:8), .Names = c("date", "ID", "value1", "value2"
), row.names = c(NA, -8L), class = "data.frame")
2
votes

Here is an option with data.table

library(data.table)
setDT(data)[,  .(max_value1 = value1[which.max(date)], 
                        mean_value2 = mean(value2)) , by = ID]
 #   ID max_value1 mean_value2
 #1:  1          2         2.5
 #2:  2          4         6.5
1
votes

You can do this using the function nth in dplyr which finds the nth value of a vector.

data %>% group_by(ID) %>% 
summarize(max_value1 = nth(value1, n = length(value1)), mean_value2 = mean(value2))

This is based on the assumption that the data is ordered by date as in the example; otherwise use arrange as discussed above.