0
votes

This should probably be easy, but I'm unable to find the answer on my own.

Take for example the iris dataset, I want to get the max petal width for each species, using this query:

iris %>% group_by(Species) %>% summarise(max(Petal.Width))

Which returns the following result:

# A tibble: 3 x 2
  Species    `max(Petal.Width)`
  <fct>                   <dbl>
1 setosa                    0.6
2 versicolor                1.8
3 virginica                 2.5

Now I want to know the petal length of each of those rows with max petal width values.[1] How would I go about doing this? A select is not working:

iris %>% group_by(Species) %>% select(Species,Petal.Length,Petal.Width) %>% summarise(max_val = max(Petal.Width))

Because I don't get the petal length attribute is still missing with this.

1: For example, the row where the petal width is 1.8 and the species is versicolor, the petal length is 4.8 - and I would like to have this info with the result.

1

1 Answers

1
votes

Think of summarize as aggregating rows - combining them. It works with functions like mean, max, n_distinct, that give a single number summary of a column.

max is sort of a special case - it is a single number summary function, but it doesn't actually combine values. In this case, you don't want to find the maximum, you want to find the row with the maximum value. Keeping a particular row is a filter operation, not a summarize operation, so we can do it like a few different ways.

## keep the row(s) with the maximum Petal Width
iris %>% 
  group_by(species) %>%
  filter(Petal.Width == max(Petal.Width))  

## sort by petal width and keep top row in each group
iris %>% 
  group_by(Species) %>%
  ## order the data by descending `Petal.Width`
  arrange(desc(Petal.Width)) %>%
  ## keep the top row
  slice(1)

## use the built-in function for this particular case!
iris %>%
  group_by(Species) %>%
  slice_max(Petal.Width)

These are basically the same - they'll differ if there are ties for the row with the maximum petal width - the slice(1) method will keep only one row, the Petal.Width == max(Petal.Width) will keep all the rows tied for the max, and the slice_max method will let you choose using the with_ties argument (defaults to keeping all tied rows).