Using the iris
data set with the addition of a Year
column as created below, I want to group by Species
and Year
to select the Year
with the maximum number of observations for each species
.
iris$Year <- c(rep(c("2007", "2008"), each = 25),
rep(c("2007", "2008"), times = c(10, 40)),
rep(c("2007", "2008"), times = c(40, 10)))
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Year
1 5.1 3.5 1.4 0.2 setosa 2007
2 4.9 3.0 1.4 0.2 setosa 2007
3 4.7 3.2 1.3 0.2 setosa 2007
4 4.6 3.1 1.5 0.2 setosa 2007
5 5.0 3.6 1.4 0.2 setosa 2007
6 5.4 3.9 1.7 0.4 setosa 2007
As seen in the summary table below, both 2007
and 2008
have 25 observations (rows) for setosa
. When there are ties in the number of observations, I want to select the minimum Year
.
In short, I am trying to select only setosa 2007, versicolor 2008, and virginica 2007 from the reproducible data set above, prefably using dplyr
EDITNote, rather than create a summary table, I want to select the respective species-year combinations from the iris
data created above. In other words, I want to subset iris
to include only setosa 2007, versicolor 2008, and virginica 2007 (i.e. the minimum year with the max number of observations). Sorry for the confusion. End Edit
library(dplyr)
iris %>%
group_by(Species, Year) %>%
summarise(N = n()) %>%
as.data.frame()
Species Year N
1 setosa 2007 25
2 setosa 2008 25
3 versicolor 2007 10
4 versicolor 2008 40
5 virginica 2007 40
6 virginica 2008 10