Add column to aggregate from data frame in R

Question

I have some data in a data frame, it looks like this (head) in data frame, df:

  site year       date  value
1  MLO 1969 1969-08-20 323.95
2  MLO 1969 1969-08-27 324.58
3  MLO 1969 1969-09-02 321.61
4  MLO 1969 1969-09-12 321.15
5  MLO 1969 1969-09-24 321.15
6  MLO 1969 1969-10-03 320.54

I am using aggregate() to find the max value by year:

ag <- aggregate(df$value ~ df$year, data=df, max)

This works great, and I have the following (head) in ag:

       df$year      df$value
1         1969        324.58
2         1970        331.16
3         1971        325.89
4         1974        336.75
5         1976        333.87
6         1977        338.63

However, I'd like to plot the original data and then layer on the data from the aggregate and in order to do that I need a column with the full date field (the one that matches the maximum value) in the aggregate. In other words, I'd need each vector in the aggregate to look like:

          df$date df$year  df$value
1      1969-08-27    1969    324.58

and so on, so I can geom_point like so:

sp <- ggplot(df, aes(x=date, y=value)) +
  labs(x="Year", y="Value") 
sp + geom_point(colour="grey60", size=1) +
     geom_point(data=ag, aes(x=`df$date`, 
                             y=`df$value`))

Is this possible with aggregate? That is, can I compute the max aggregate values using year, but then have it add on the date field from the matching row in the data frame?

Thank you!!

Just a comment - aggregate(df$value ~ df$year, data=df, max) is shorter and cleaner as aggregate(value ~ year, data=df, max) as you will avoid the funky column names like `df$date` — thelatemail

astrofunkswag astrofunkswag · Accepted Answer · 2018-04-11T00:11:40

Solution using dplyr and made up data

library(dplyr)
df <- data.frame(year = c(1969, 1969, 1969, 1970, 1970), date = c("1969-08-20", "1969-08-21", "1969-08-22", "1970-08-20", "1969-08-21"), 
                 value = c(1,3,2, 10, 8))

df %>% group_by(year) %>% summarise(max_val = max(value),
                                    max_date = date[which.max(value)])
# A tibble: 2 x 3
   year max_val max_date  
  <dbl>   <dbl> <chr>     
1 1969.      3. 1969-08-21
2 1970.     10. 1970-08-20

Add column to aggregate from data frame in R

3 Answers

Overview