1
votes

Right now I am trying to create a new dummy variable in a dataset out of a variable that has more than two vectors. More specifically, my dataset has a "State" variable, and I want to make a dummy where 1 = states in the North, and 0 = all other states. Here's a portion of the dataset (it's an extremely large set so I'll only include the essential data):

  Year     StateICP  
1 1940        71     
2 1940        21     
3 1940        22     
4 1940        32     
5 1940        18     
6 1940        22  
7 1940        45     
8 1940        40     
9 1940        33     

So what I would want to do is create a new Column (called "North") where if the StateICP = 21, 22, 40, or 45, then the new variable would = 1, and otherwise would be 0. Like I said, this is a very large dataset (over 1000000 observations), so I can't enter it row by row manually. I tried an ifelse function, but that only gave me errors.

I'm sure this isn't that complicated, but I am fairly new to R. I know how to create a dummy variable normally, but I am getting stuck here. Any help would be greatly appreciated! Thank you!

1
Did your ifelse look something like: ifelse(yourDF$StateICP %in% c(21, 22, 40, 45), 1, 0)?conrad-mac

1 Answers

2
votes

So, creating simple dataset to replicate what you have above:

df <- data.frame(Year = rep(1940,500), StateICP = sample(1:100, 500, TRUE))

This will create a data.frame with columns like you describe and 500 records. The StateICP values are randomly generated integers between 1 and 100. If we want to code a boolean we could simply add a new column:

df$boolean <- df$StateICP %in% c(21, 22, 40, 45)

If we want to code them specifically as 0,1 as you describe then you can use ifelse:

df$dummy <- ifelse(df$StateICP %in% c(21, 22, 40, 45), 1, 0)

You have to make sure you are using a vector in the ifelse (since it does not accept a data argument).