Remove rows out of a specific year range, without using for-loop in R

Question

I am looking for a way to omit the rows which are not between two specific values, without using for loop. All rows in year column are between 1999 and 2002, however some of them do not include all years between these two dates. You can see the initial data as follows:

a <- data.frame(year = c(2000:2002,1999:2002,1999:2002,1999:2001), 
                id=c(4,6,2,1,3,5,7,4,2,0,-1,-3,4,3))

   year id
1  2000  4
2  2001  6
3  2002  2
4  1999  1
5  2000  3
6  2001  5
7  2002  7
8  1999  4
9  2000  2
10 2001  0
11 2002 -1
12 1999 -3
13 2000  4
14 2001  3

Processed dataset should only include consecutive rows between 1999:2002. The following data.frame is exactly what I need:

When I execute the following for loop, I get previous data.frame without any problem:

for(i in 1:which(a$year == 2002)[length(which(a$year == 2002))]){
  if(a[i,1] == 1999 & a[i+3,1] == 2002){
    b <- a[i:(i+3),]
  }else{next}

  if(!exists("d")){
    d <- b
  }else{
    d <- rbind(d,b)
  }
}

However, I have more than 1 million rows and I need to do this process without using for loop. Is there any faster way for that?

Are there other columns in the data that ensure the row order? It seems like your data happen to be in order based on some type of grouping and that that grouping is important. — hmhensen

Phoenix Mu Phoenix Mu · Accepted Answer · 2018-08-31T21:47:33

There is a function that can do this automatically.

First, install the package called dplyr or tidyverse with command install.packages("dplyr") or install.packages("tidyverse").

Then, load the package with library(dplyr).

Then, use the filter function: a_filtered = filter(a, year >=1999 & year < 2002).

This should be fast even there are many rows.

Remove rows out of a specific year range, without using for-loop in R

3 Answers