It's probably a character flaw but I sometimes resist picking up new packages. The "base R" functions can often do the job. In this case I think the alue of the dplyr
package shows through since I stumbled in creating a good solution since the ave
function returned a character value for a logical test, which I still don't understand. So I think dplyr is a real gem. And if I could I'd like to insist that any upvotes be preceded by an upvote to akrun's answer. (It's hard to believe this hasn't already been asked and answered on SO.)
Anyway:
> df[ as.logical(
ave(df$date, df$ID, FUN=function(d) as.Date(d , '%m/%d/%Y') ==
max(as.Date(d, '%m/%d/%Y'))))
, ]
ID date
2 1 03/14/2001
6 2 02/01/2008
7 3 08/22/2011
I thought this should work (fail) :
> df[ ave(df$date, df$ID, FUN=function(d) as.Date(d , '%m/%d/%Y') ==max(as.Date(d, '%m/%d/%Y'))) , ]
ID date
NA NA <NA>
NA.1 NA <NA>
NA.2 NA <NA>
NA.3 NA <NA>
NA.4 NA <NA>
NA.5 NA <NA>
NA.6 NA <NA>
NA.7 NA <NA>
NA.8 NA <NA>
Here's another base R solution that worked the first time with no surprises:
> do.call( rbind, by(df, df$ID, function(d) d[ which.max(as.Date(d$date, '%m/%d/%Y')), ] ) )
ID date
1 1 03/14/2001
2 2 02/01/2008
3 3 08/22/2011
Here's one inspired by @rawr's notion of taking the last one from an ordered subset:
> do.call( rbind, by(df, df$ID, function(d) tail( d[ order(as.Date(d$date, '%m/%d/%Y')), ] ,1)) )
ID date
1 1 03/14/2001
2 2 02/01/2008
3 3 08/22/2011