2
votes

I have got a data frame (which is called "data.set.y"). And I would like to get subset of this data frame (which I call data.frame.y.p1). This subset contains all rows that contain the string 1990 or 1991 or 1992 or 1993 or 1994 or 1995 or 1996 or 1997 or 1998 or 1999 in the column "Entity"

I got the right subset with the following code:

data.set.y.p1 <- subset(data.set.y, substring(data.set.y$Entity, 13,16) == 1990 | substring(data.set.y$Entity, 13,16) == 1991 |
                                    substring(data.set.y$Entity, 13,16) == 1992 | substring(data.set.y$Entity, 13,16) == 1993 |
                                    substring(data.set.y$Entity, 13,16) == 1994 | substring(data.set.y$Entity, 13,16) == 1995 |
                                    substring(data.set.y$Entity, 13,16) == 1996 | substring(data.set.y$Entity, 13,16) == 1997 |
                                    substring(data.set.y$Entity, 13,16) == 1998 | substring(data.set.y$Entity, 13,16) == 1999)

Now I would like to substitute this long code with something more elegant. I tried already the following:

years <- c(1990:1999)
data.set.y.p1 <- subset(data.set.y, substring(data.set.y$Entity, 13,16) == years)

But it does not work.

Does anybody have an idea how to get rid of all this single conditions with all the years?

1

1 Answers

3
votes

I believe the %in% operator is what you're looking for:

data.set.y.p1 <- subset(data.set.y, substring(data.set.y$Entity, 13,16) %in% years)