recoding race with 4 categories to 3 categories and creating 2 dummies in R

Question

I am working with a variable for race that takes on the following values:1 Black, 2 Hispanic, 3 Mixed Race (Non-Hispanic), 4 Non-Black / Non-Hispanic. I want to sum up 3 and 4 and make it the base category and keep Black and Hispanic. I tried to create 2 dummies (Black=1 and other Hispanic=1) and 2 extra columns are created, but the values in them are not 1 and 0, but False and True. The code I used:

nlsy2$Hispanic <- nlsy2$Race==2
nlsy2$Black <- nlsy2$Race==1
nlsy2$Race [ nlsy2$Race == 0 ] <- 3
nlsy2$Race [ nlsy2$Race == 0 ] <- 4

Also when I run summary(nlsy2$Hispanic) R gives me this output:

   Mode   FALSE    TRUE    NA's 
logical    5594    1526       0

Are the NA's problematic when running a glm? Also, if you have a better code solution in how I can recode the race variable, it would be much appreciated! Thank you!

Try grouping the categories through levels function in R , refer to [link] stackoverflow.com/questions/9604001/… , and why do you need to convert to dummy for modelling and not use them as.factor? For NA you can always include na.action = na.exclude in your code and based on data you can always consider imputing it using mice package — Learner_seeker
@Adam Quek: Yes! Thank you the NA disappears for Hispanic :D — bree

shians shians · Accepted Answer · 2017-04-24T02:57:33

Does

nlsy$Race[nlsy$Race == 3 | nlsy$Race == 4] <- 0
nlsy$Race <- factor(nlsy$Race)

not do the job? You're going to want it in factors rather than numeric when doing any modelling because these are categorical and you don't want to risk them being interpreted as numeric.

recoding race with 4 categories to 3 categories and creating 2 dummies in R

1 Answers