0
votes

I am working with a variable for race that takes on the following values:1 Black, 2 Hispanic, 3 Mixed Race (Non-Hispanic), 4 Non-Black / Non-Hispanic. I want to sum up 3 and 4 and make it the base category and keep Black and Hispanic. I tried to create 2 dummies (Black=1 and other Hispanic=1) and 2 extra columns are created, but the values in them are not 1 and 0, but False and True. The code I used:

nlsy2$Hispanic <- nlsy2$Race==2
nlsy2$Black <- nlsy2$Race==1
nlsy2$Race [ nlsy2$Race == 0 ] <- 3
nlsy2$Race [ nlsy2$Race == 0 ] <- 4

Also when I run summary(nlsy2$Hispanic) R gives me this output:

   Mode   FALSE    TRUE    NA's 
logical    5594    1526       0 

Are the NA's problematic when running a glm? Also, if you have a better code solution in how I can recode the race variable, it would be much appreciated! Thank you!

1
try nlsy2$Hispanic <- (nlsy2$Race == 2) + 0Adam Quek
Also, please provide a reproducible exampleAdam Quek
Try grouping the categories through levels function in R , refer to [link] stackoverflow.com/questions/9604001/… , and why do you need to convert to dummy for modelling and not use them as.factor? For NA you can always include na.action = na.exclude in your code and based on data you can always consider imputing it using mice packageLearner_seeker
@Adam Quek: Yes! Thank you the NA disappears for Hispanic :Dbree

1 Answers

0
votes

Does

nlsy$Race[nlsy$Race == 3 | nlsy$Race == 4] <- 0
nlsy$Race <- factor(nlsy$Race)

not do the job? You're going to want it in factors rather than numeric when doing any modelling because these are categorical and you don't want to risk them being interpreted as numeric.