I have some data in a data frame, it looks like this (head) in data frame, df:
site year date value
1 MLO 1969 1969-08-20 323.95
2 MLO 1969 1969-08-27 324.58
3 MLO 1969 1969-09-02 321.61
4 MLO 1969 1969-09-12 321.15
5 MLO 1969 1969-09-24 321.15
6 MLO 1969 1969-10-03 320.54
I am using aggregate() to find the max value by year:
ag <- aggregate(df$value ~ df$year, data=df, max)
This works great, and I have the following (head) in ag:
df$year df$value
1 1969 324.58
2 1970 331.16
3 1971 325.89
4 1974 336.75
5 1976 333.87
6 1977 338.63
However, I'd like to plot the original data and then layer on the data from the aggregate and in order to do that I need a column with the full date field (the one that matches the maximum value) in the aggregate. In other words, I'd need each vector in the aggregate to look like:
df$date df$year df$value
1 1969-08-27 1969 324.58
and so on, so I can geom_point like so:
sp <- ggplot(df, aes(x=date, y=value)) +
labs(x="Year", y="Value")
sp + geom_point(colour="grey60", size=1) +
geom_point(data=ag, aes(x=`df$date`,
y=`df$value`))
Is this possible with aggregate? That is, can I compute the max aggregate values using year, but then have it add on the date field from the matching row in the data frame?
Thank you!!
aggregate(df$value ~ df$year, data=df, max)
is shorter and cleaner asaggregate(value ~ year, data=df, max)
as you will avoid the funky column names like`df$date`
– thelatemail