0
votes

I want to filter a dataframe using dplyr contains() and filter. Must be simple, right? The examples I've seen use base R grepl which sort of defeats the object. Here's a simple dataframe:

site_type <- c('Urban','Rural','Rural Background','Urban Background','Roadside','Kerbside')
df <- data.frame(row_id, site_type)
df <- as.tibble(df)
df

Now I want to filter the dataframe by all rows where site.type contains the string background. I can find the string directly if I know the unique values of site_type:

filtered_df <- filter(df, site_type == 'Urban Background')

But I want to do something like:

filtered_df <- filter(df, site_type(contains('background', match_case = False)))

Any ideas how to do that? Can dplyr helper contains only be used with columns and not rows?

1
Omitted the row_id in the df by mistake, row_id = c(id1, id2, id3 ...) You get the picture :-)LucieCBurgess

1 Answers

1
votes

The contains function in dplyr is a select helper. It's purpose is to help when using the select function, and the select function is focused on selecting columns not rows. See documentation here.

filter is the intended mechanism for selecting rows. The function you are probably looking for is grepl which does pattern matching for text.

So the solution you are looking for is probably:

filtered_df <- filter(df, grepl("background", site_type, ignore.case = TRUE))

I suspect that contains is mostly a wrapper applying grepl to the column names. So the logic is very similar.

References: