2
votes

I found similar entries but not exactly what I want. For two categorized variable (e.g., gender(1,2)), I need to create a dummy variable, 0s being male and 1s being female.

Here how my data look like and what I did.

 data <- as.data.frame(as.matrix(c(1,2,2,1,2,1,1,2),8,1))
  V1
1  1
2  2
3  2
4  1
5  2
6  1
7  1
8  2 
library(dummies)
data <- cbind(data, dummy(data$V1, sep = "_"))
   > data
  V1 data_1 data_2
1  1      1      0
2  2      0      1
3  2      0      1
4  1      1      0
5  2      0      1
6  1      1      0
7  1      1      0
8  2      0      1

In this code, the second category is also (0,1). Also, is there a way to determine which to determine the baseline (assigning 0 to any category)?

I want it to look like this:

   > data
  V1     V1_dummy
1  1      0 
2  2      1 
3  2      1 
4  1      0 
5  2      1  
6  1      0  
7  1      0  
8  2      1 

Also, I want to extend this to three category variables, having two categories after recoding (n-1).

Thanks in advance!

1
I don't think I understand what the end result you're looking for. Is it data the way it is and you're looking for a better way to program it? If not, what should it look like? What will it look like with three categories?twedl
Sorry, I now added what the data should look like. For a two-category variable, I only need one dummy column representing one category. for a three-category variable, I need two dummy variables, given the third category as a baseline/comparison category.amisos55
Not completely sure I understand, do you just want 1,0 coded variables? If so, you could you use ifelse() to create them the way you describe. E.g., if male is 2 and female is 1: data$female <- ifelse(data$V1 == 2, 0, V1) Also, check out factors in R if you have not already.Andrew

1 Answers

1
votes

You can use model.matrix in the following way. Some sample data with a three level factor:

set.seed(1)
(df <- data.frame(x = factor(rbinom(5, 2, 0.4))))
#   x
# 1 0
# 2 1
# 3 1
# 4 2
# 5 0

Then

model.matrix(~ x, df)[, -1]
#   x1 x2
# 1  0  0
# 2  1  0
# 3  1  0
# 4  0  1
# 5  0  0

If you want to specify which group disappears, we need to rearrange the factor levels. It is the first group that disappears. So, e.g.,

levels(df$x) <- c("1", "0", "2")
model.matrix(~x, df)[, -1]
#   x0 x2
# 1  0  0
# 2  1  0
# 3  1  0
# 4  0  1
# 5  0  0