How to use Dplyr's Summarize and which() to lookup min/max values

Question

I have the following data:

Name <- c("Sam", "Sarah", "Jim", "Fred", "James", "Sally", "Andrew", "John", "Mairin", "Kate", "Sasha", "Ray", "Ed")
Age <- c(22,12,31,35,58,82,17,34,12,24,44,67,43)
Group <- c("A", "B", "B", "B", "B", "C", "C", "D", "D", "D", "D", "D", "D") 
data <- data.frame(Name, Age, Group)

And I'd like to use dplyr to

(1) group the data by "Group" (2) show the min and max Age within each Group (3) show the Name of the person with the min and max ages

The following code does this:

data %>% group_by(Group) %>%
     summarize(minAge = min(Age), minAgeName = Name[which(Age == min(Age))], 
               maxAge = max(Age), maxAgeName = Name[which(Age == max(Age))])

Which works well:

  Group minAge minAgeName maxAge maxAgeName
1     A     22        Sam     22        Sam
2     B     12      Sarah     58      James
3     C     17     Andrew     82      Sally
4     D     12     Mairin     67        Ray

However, I have a problem if there are multiple min or max values:

Name <- c("Sam", "Sarah", "Jim", "Fred", "James", "Sally", "Andrew", "John", "Mairin", "Kate", "Sasha", "Ray", "Ed")
Age <- c(22,31,31,35,58,82,17,34,12,24,44,67,43)
Group <- c("A", "B", "B", "B", "B", "C", "C", "D", "D", "D", "D", "D", "D") 
data <- data.frame(Name, Age, Group)

> data %>% group_by(Group) %>%
+   summarize(minAge = min(Age), minAgeName = Name[which(Age == min(Age))], 
+             maxAge = max(Age), maxAgeName = Name[which(Age == max(Age))])
Error: expecting a single value

I'm looking for two solutions:

(1) where it doesn't matter which min or max name is shown, just that one is shown (i.e., the first value found) (2) where if there are "ties" all minimum values and maximum values are shown

Please let me know if this isn't clear and thanks in advance!

Using data.table. setDT(data)[, c(.SD[which.min(Age)], .SD[which.max(Age)]), Group] and change the names accordingly — akrun
@akrun one .SD: setDT(data)[, .SD[c(which.min(Age),which.max(Age))], Group]. Similarly, the selecting rows thing works here again: setDT(data)[data[, .I[c(which.min(Age),which.max(Age))], Group]$V1]. Still, that's only the first half of the question (in case someone's making an answer of it). — Frank
@Frank That was a good one using c, but your solution is in the long form isn't it. — akrun

shadow shadow · Accepted Answer · 2015-05-12T16:27:39

You can use which.min and which.max to get the first value.

data %>% group_by(Group) %>%
  summarize(minAge = min(Age), minAgeName = Name[which.min(Age)], 
            maxAge = max(Age), maxAgeName = Name[which.max(Age)])

To get all values, use e.g. paste with an appropriate collapse argument.

data %>% group_by(Group) %>%
  summarize(minAge = min(Age), minAgeName = paste(Name[which(Age == min(Age))], collapse = ", "), 
            maxAge = max(Age), maxAgeName = paste(Name[which(Age == max(Age))], collapse = ", "))

How to use Dplyr's Summarize and which() to lookup min/max values

3 Answers