1
votes

I have trouble generating the following dummy-variables in R under certain conditions.

var1<-c("a","b","c","a","a","a","b","c","b","a")
var2<-c("val1","val1","val1","val1","val2","val1","val3","val3","val2","val1")
db<-data.frame(cbind(var1,var2))

I would like to generate the column var3 under the following rules:

  • if var2="val2" OR var2="val3" then var3=1
  • if var2="val1" AND var1="a" then has to assign randomly n times (let's say 3) the value 1 otherwise 0.
  • if var2="val1" AND var1!="a" then var3=1

What I'm trying to do is to exclude randomly some rows (marked with 0) under defined conditions.

Can somebody help me, please?

1

1 Answers

2
votes

Seems like you can start by making all var3 1 and then making only the ones that meet second condition zero

db$var3 = 1
x = which(db$var2 == "val1" & db$var1 == "a") #Get indices where 2nd condition is met
db$var3[sample(x, length(x) - 3)] = 0 #Assign 0 all indices in x except 3