Reducing Number of Factors in R based on multiple combinations of possible factors

Question

I have some census data where people were allowed to list their race as either one or a combination of many different races. We allow them to select from these choices.

American Indian

East Asian

Pacific Islander

Black or African American

White or Caucasian

Hispanic or Latino/a

South Asian

Middle Eastern

Other

The resulting data is quite messy if you want to make contingency tables of the race of people because the data output, which I've provided a sample of below, has one person listed as many different races.

structure(list(Race = structure(c(3L, 2L, 3L, 9L, 9L, 11L, 
5L, 11L, 3L, 3L, 3L, 3L, 7L, 3L, 11L, 5L, 9L, 10L, 9L, 10L, 2L, 
3L, 2L, 6L, 9L, 10L, 3L, 10L, 8L, 3L, 5L, 1L, 2L, 9L, 4L, 3L), .Label = c("Black or African American", 
"Black or African American,White or Caucasian", "East Asian", 
"East Asian,Pacific Islander", "Hispanic or Latino/a", "Other", 
"Pacific Islander", "South Asian", "White or Caucasian", "White or Caucasian,Hispanic or Latino/a", 
"White or Caucasian,Middle Eastern"), class = "factor")), class = "data.frame", row.names = c(NA, 
-36L))

To reduce the number of factors I'd like to turn any cell that has multiple races in the cell to just "Mixed". Like this cell that says "White or Caucasian,Middle Eastern" should just be turned into mixed. Because my actual dataset is massive with multiple different combinations of races using something like gsub() and inputting in all the combinations to replace with "Mixed" doesn't seem feasible to me.

Have you tried: data %>% mutate(Race = as.character(Race), Race2 = replace(Race, grepl(",", Race), "Mixed")) — Eugene

Eugene Eugene · Accepted Answer · 2019-03-03T07:00:37

Using dplyr for convenience, but you can do it with base

data %>% 
  mutate(Race  = as.character(Race), 
         Race2 = replace(Race, grepl(",", Race), "Mixed"))

Reducing Number of Factors in R based on multiple combinations of possible factors

1 Answers