Succinct subsetting across multiple columns in R

Question

Say I have a massive dataframe and in multiple columns I have an extremely large list of unique codes and I want to use these codes to select certain rows to subset the original dataframe. There are around 1000 codes and the codes I want all follow after each other. For example I have about 30 columns that contain codes and I only want to take rows that have codes 100 to 120 in ANY of these columns .

There's a long way to do this which is something like

new_dat <- df[which(df$codes==100 | df$codes==101 | df$codes1==100

and I repeat this for every single possible code for everyone of the columns that can contain these codes. Is there a way to do this in a more convenient fashion?

I want to try solving this with dplyr's select function, but I'm having trouble seeing if it works for my case out of the box

Take the iris dataset

Say I wanted all rows that contain the value 4.0-5.0 in any columns that contains the word Sepal in the column name.

#this only goes for 4.0

brand_new_df <- select(filter(iris, Sepal.Length ==4.0 | Sepal.Width == 4.0))

but what I want is something like

brand_new_df <- select(filter(iris, contains(Sepal) == 4.0:5.0))

Is there a dplyr way to do this?

Ronak Shah Ronak Shah · Accepted Answer · 2020-06-13T05:16:49

You can use filter_at :

library(dplyr)
iris %>%  filter_at(vars(contains('Sepal')), any_vars(between(., 4, 5)))

#   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#1           4.9         3.0          1.4         0.2     setosa
#2           4.7         3.2          1.3         0.2     setosa
#3           4.6         3.1          1.5         0.2     setosa
#4           5.0         3.6          1.4         0.2     setosa
#5           4.6         3.4          1.4         0.3     setosa
#6           5.0         3.4          1.5         0.2     setosa
#7           4.4         2.9          1.4         0.2     setosa
#....

Succinct subsetting across multiple columns in R

5 Answers