0
votes

I want to create a new variable "race" with 4 categories (Black/Latino/White/Others) in my dataset "data2" (see table below).
The conditions I want to apply are:

  1. Anyone indicating 'Hispanic or Latino' is coded as Latino, regardless if they indicated other options of race.
  2. Those who indicated 'Black' (only) would be coded Black
  3. Those who indicated 'White' (only) would be coded White
  4. Those who indicated 'Asian' or 'Native American', or 'others', OR indicated more than one option (unless Hispanic or Latino), would be coded Others

I want to use mutate and case_when function to create a new variable race following the condition above.

race_1 - Asian
race_2 - Black
race_3 - Hispanic or Latino
race_4 - Native American
race_5 - White
race_6 - others

race_1 race_2 race_3 race_4 race_5 race_6
NA 1 NA NA NA NA
NA NA 1 NA NA NA
NA NA 1 NA NA NA
1 NA 1 NA 1 NA
NA NA NA NA 1 NA
NA NA 1 NA NA NA
1

1 Answers

0
votes
dat %>%
  # convert `NA` to false, all others to true ... verify this is what you want
  mutate_at(vars(starts_with("race_")), ~ !is.na(.)) %>%
  # provide a variable that lists how many races were selected
  mutate(combined = rowSums(across(starts_with("race_")))) %>%
  mutate(race = case_when(
    race_3                 ~ "Latino",
    race_2 & combined == 1 ~ "Black",
    race_5 & combined == 1 ~ "White",
    TRUE                   ~ "Others")
  )
#   race_1 race_2 race_3 race_4 race_5 race_6 combined results
# 1  FALSE   TRUE  FALSE  FALSE  FALSE  FALSE        1   Black
# 2  FALSE  FALSE   TRUE  FALSE  FALSE  FALSE        1  Latino
# 3  FALSE  FALSE   TRUE  FALSE  FALSE  FALSE        1  Latino
# 4   TRUE  FALSE   TRUE  FALSE   TRUE  FALSE        3  Latino
# 5  FALSE  FALSE  FALSE  FALSE   TRUE  FALSE        1   White
# 6  FALSE  FALSE   TRUE  FALSE  FALSE  FALSE        1  Latino