4
votes

I'm trying to create a dplyr pipeline to filter on

Imagine a data frame jobs, where I want to filter out the most-senior positions from the titles column:

titles

Chief Executive Officer
Chief Financial Officer
Chief Technical Officer
Manager
Product Manager
Programmer
Scientist
Marketer
Lawyer
Secretary

R code for filtering them out (up to 'Manager') would be...

jobs %>% 
filter(!str_detect(title, 'Chief')) %>%
filter(!str_detect(title, 'Manager')) ...

but I want to still keep "Program Manager" in the final filtering to produce a new data frame with all of the "lower level jobs" like

Product Manager
Programmer
Scientist
Marketer
Lawyer
Secretary

Is there a way to specify the str_detect() filter on a given value EXCEPT for one particular string?

Assume that the data frame's column has 1000s of roles, with various string combinations including "Manager," but there will always be a filter on a specific exception.

1
Why not simply use anchors to "Manager" like so ... %>% filter(!stringr::str_detect(title, "^Chief|^Manager$)). The ^ anchor tells the regex to match strings starting with "Manager". The other anchor $ ensures that the string must also end with "Manager". - JdeMello

1 Answers

4
votes

Or you could have a separate filter for "Product Manager"

library(tidyverse)

jobs %>% 
filter((!str_detect(title, "Chief|Manager")) | str_detect(title, "Product Manager"))


#            title
#1 Product Manager
#2      Programmer
#3       Scientist
#4        Marketer
#5          Lawyer
#6       Secretary

which can be also twisted in base R using grepl/grep

jobs[c(grep("Product Manager",jobs$title), 
       grep("Chief|Manager", jobs$title, invert = TRUE)),, drop = FALSE]