dplyr filter: Get rows with minimum of variable, but only the first if multiple minima

Question

I want to make a grouped filter using dplyr, in a way that within each group only that row is returned which has the minimum value of variable x.

My problem is: As expected, in the case of multiple minima all rows with the minimum value are returned. But in my case, I only want the first row if multiple minima are present.

Here's an example:

df <- data.frame(
A=c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
x=c(1, 1, 2, 2, 3, 4, 5, 5, 5),
y=rnorm(9)
)

library(dplyr)
df.g <- group_by(df, A)
filter(df.g, x == min(x))

As expected, all minima are returned:

Source: local data frame [6 x 3]
Groups: A

  A x           y
1 A 1 -1.04584335
2 A 1  0.97949399
3 B 2  0.79600971
4 C 5 -0.08655151
5 C 5  0.16649962
6 C 5 -0.05948012

With ddply, I would have approach the task that way:

library(plyr)
ddply(df, .(A), function(z) {
    z[z$x == min(z$x), ][1, ]
})

... which works:

  A x           y
1 A 1 -1.04584335
2 B 2  0.79600971
3 C 5 -0.08655151

Q: Is there a way to approach this in dplyr? (For speed reasons)

@hadley, 1) I don't think min_rank helps here. He needs the first min value (look at plyr solution). 2) In whatever programming language you write, the algorithmic complexity of rank (ties=min, max, first etc..) will be bigger than just computing min. — Arun
@Arun: True, only rank(x, ties.method="first")==1 works, as min and min_rank do not differentiate between multiple minima. — Felix S
@hadley, I still don't see how that makes you consider which.min to be premature optimisation. AFAIK it's a natural choice, reads well, easy to understand, fast as it happens to be O(n) too. — Arun

talat talat · Accepted Answer · 2014-05-20T11:42:49

Update

With dplyr >= 0.3 you can use the slice function in combination with which.min, which would be my favorite approach for this task:

df %>% group_by(A) %>% slice(which.min(x))
#Source: local data frame [3 x 3]
#Groups: A
#
#  A x          y
#1 A 1  0.2979772
#2 B 2 -1.1265265
#3 C 5 -1.1952004

Original answer

For the sample data, it is also possible to use two filter after each other:

group_by(df, A) %>% 
  filter(x == min(x)) %>% 
  filter(1:n() == 1)

dplyr filter: Get rows with minimum of variable, but only the first if multiple minima

8 Answers

Update

Original answer