41
votes

UPDATE: dplyr has been updated since this question was asked and now performs as the OP wanted

I´m trying to get the second to the seventh line in a data.frame using dplyr.

I´m doing this:

require(dplyr)
df <- data.frame(id = 1:10, var = runif(10))
df <- df %>% filter(row_number() <= 7, row_number() >= 2)

But this throws an error.

Error in rank(x, ties.method = "first") : 
  argument "x" is missing, with no default

I know i could easily make:

df <- df %>% mutate(rn = row_number()) %>% filter(rn <= 7, rn >= 2)

But I would like to understand why my first try is not working.

3
df %>% filter(row_number() %in% 2:7) - akrun
I could do that too, but why df <- df %>% filter(row_number() <= 7, row_number() >= 2) is wrong? - Daniel Falbel
I don't know the real reason behind that. A double filter appears to work. - akrun
It's a bug. Please file an issue on github.com/hadley/dplyr/issues - hadley
I think it is useful to have around as long as it is clear, that it is now out of date, this way, people (like me) looking for help can see that this is not a problem. I edited the post for clarity. - Jonno Bourne

3 Answers

90
votes

Actually dplyr's slice function is made for this kind of subsetting:

df %>% slice(2:7)

(I'm a little late to the party but thought I'd add this for future readers)

28
votes

The row_number() function does not simply return the row number of each element and so can't be used like you want:

• ‘row_number’: equivalent to ‘rank(ties.method = "first")’

You're not actually saying what you want the row_number of. In your case:

df %>% filter(row_number(id) <= 7, row_number(id) >= 2)

works because id is sorted and so row_number(id) is 1:10. I don't know what row_number() evaluates to in this context, but when called a second time dplyr has run out of things to feed it and you get the equivalent of:

> row_number()
Error in rank(x, ties.method = "first") : 
  argument "x" is missing, with no default

That's your error right there.

Anyway, that's not the way to select rows.

You simply need to subscript df[2:7,], or if you insist on pipes everywhere:

> df %>% "["(.,2:7,)
  id        var
2  2 0.52352994
3  3 0.02994982
4  4 0.90074801
5  5 0.68935493
6  6 0.57012344
7  7 0.01489950
8
votes

Here is another way to do row-number based filtering in a pipeline.

    df <- data.frame(id = 1:10, var = runif(10))

    df %>% .[2:7,]

    > id     var
      2  2 0.28817
      3  3 0.56672
      4  4 0.96610
      5  5 0.74772
      6  6 0.75091
      7  7 0.05165