I have some census data where people were allowed to list their race as either one or a combination of many different races. We allow them to select from these choices.
American Indian
East Asian
Pacific Islander
Black or African American
White or Caucasian
Hispanic or Latino/a
South Asian
Middle Eastern
Other
The resulting data is quite messy if you want to make contingency tables of the race of people because the data output, which I've provided a sample of below, has one person listed as many different races.
structure(list(Race = structure(c(3L, 2L, 3L, 9L, 9L, 11L,
5L, 11L, 3L, 3L, 3L, 3L, 7L, 3L, 11L, 5L, 9L, 10L, 9L, 10L, 2L,
3L, 2L, 6L, 9L, 10L, 3L, 10L, 8L, 3L, 5L, 1L, 2L, 9L, 4L, 3L), .Label = c("Black or African American",
"Black or African American,White or Caucasian", "East Asian",
"East Asian,Pacific Islander", "Hispanic or Latino/a", "Other",
"Pacific Islander", "South Asian", "White or Caucasian", "White or Caucasian,Hispanic or Latino/a",
"White or Caucasian,Middle Eastern"), class = "factor")), class = "data.frame", row.names = c(NA,
-36L))
To reduce the number of factors I'd like to turn any cell that has multiple races in the cell to just "Mixed". Like this cell that says "White or Caucasian,Middle Eastern" should just be turned into mixed. Because my actual dataset is massive with multiple different combinations of races using something like gsub()
and inputting in all the combinations to replace with "Mixed" doesn't seem feasible to me.