1
votes

I have a data set of a group of college with various names such as "x college" "x university" and "x community college" and need to group them by their classification of college, community college, or university.

And then organizing them by state. There are 5 rows: Name, Location, two types of tuition, and private or public.

I have tried this: typeSchool <- c("College", "University", "Community College") filter(tibble, str_detect(words, paste(typeSchool)))

But it has not worked. Looking for suggestions.

Should I try mutating variables and adding a separate variable for each classification and then group_by(classification)?

Sample Rows:

Also would it be possible to use a form of grep for this?


structure(list(Name = structure(c(5L, 1L, 6L, 4L, 3L, 2L), .Label = c("Bard College", 

"Brown University", "Connecticut College", "Dartmouth College", "Landmark College", "St. John's College"), class = "factor"), Location = structure(c(5L, 1L, 6L, 2L, 3L, 4L), .Label = c("ANNANDALE-ON-HUDSON, NY", "HANOVER, NH", "NEW LONDON, CT", "PROVIDENCE, RI", "PUTNEY, VT", "SANTA FE, NM"), class = "factor"), In.State.Tuition = c(50080L, 49906L, 49644L, 49506L, 49350L, 49346L), Out.of.State.Tuition = c(50080L, 49906L, 49644L, 49506L, 49350L, 49346L), Type = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Private", class = "factor")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

1
The issue is that 'words' is not a column name in your dataset. Not sure which column you are trying. Name is one column that have the 'College' substring. So, insted of 'word', should change it to 'Name'akrun

1 Answers

0
votes

We can collapse with | to create a single string to be used as pattern in str_detect

library(stringr)
library(dplyr)
filter(tibble, str_detect(Name, paste(typeSchool, collapse = "|")))

To make sure that this won't match any substrings, we can use the word boundary (\\b)

filter(tibble, str_detect(Name, paste0("\\b(", paste(typeSchool, collapse = "|"), ")\\b")))

As we are already using stringr, an option is str_c which would be helpful if there are missing values as it returns NA when any value is NA as opposed to paste which will paste the NA also in the string

filter(tibble, str_detect(Name,  str_c(typeSchool, collapse = "|")))

Update

If the intention is to split or create multiple datasets based on the 'typeSchool', we can map over the 'typeSchool', filter the rows based on the substring match to return a list of tibbles

library(purrr)
lst1 <-  map(typeSchool, ~  tibble %>% 
             filter(str_detect(Name, .x)))