I need to produce some new factor variables in my dataset which contain information from existing factor variables.
In the first case I need to produce a binary NewVariable based on whether certain values occur in a specific variable which has more than 100 levels. I use the revalue() from the plyr package Namely,
NewVar <- if(OldVar1=="helen" | OldVar1=="greg")
{NewVar <-revalue(OldVar1, c("helen"="participant", "greg"="participant"))}
else {NewVar=="nonparticipant"}
I actually want to collapse specific levels into a specific level from the new variable. As you can imagine the above code does not work but I cannot figure out why.
In the second case I need to combine information from three existing factor variables (OldVar1, OldVar2, OldVar3) in order to fill in the levels of a multi-categorical NewVariable, I run this code,
NewVariable="OptionA" <- if(OldVar1=="a" & OldVar2=="b" & OldVar3=="c")
I get an error "Error: unexpected '=' in "OldVar=" the same occurs when I remove one of the = in the OldVar1=="a"
Is it possible to create a factor NewVariable with its levels and labels without filling them with the string values in advance? I was not able to find something on that, the tutorials I see have produced their data and they just have to label the existing values.
Also, I would like to give values to the rest of my cases who either belong to OptionA, OptionB, OptionC, etc, will this be possible setting a different if-statement for each one of them as the following?
NewVariable="OptionA" <- if(OldVar1=="a" & OldVar2=="b" & OldVar3=="c")
NewVariable="OptionB" <- if(OldVar1=="a" & OldVar2=="d" & OldVar3=="e")
=== EDIT ===
For the second "challenge" I followed the code suggested by DWin I produced an interaction of my three variables that I have in the if(...) above and set inside c() only the values that I needed, for example
OldVar.ALL.interactions <- with(data, interaction(OldVar1, OldVar2, OldVar3)
levels(OldVar.ALL.interactions) # search for the levels that we need to include
# in the NewVar
# below I follow DWin's code
NewVar <- factor(rep(NA, length(AnotherVarOfTheDataset) ),
levels=c("OptionA", "OptionB", ...))
NewVar[OldVar.ALL.interactions %in% c("...interaction.of.Old.Variables...")] <- "OptionA"
# the same as in OptionA for the rest of the levels
# the ** NewVar[ is.na(NewVar) ] <- "nonparticipant" ** of DWin's code is not needed
Is there any other way to solve this issue without using the interaction between the Old factor variables?
levels(NewVar) <- gsub("greg|helen" ...)
and realized that would fail. You also cannot use:else {NewVar=="nonparticipant"}
if you wanted to do an assignment. Then there is the whole problem thatif
andelse
are not vectorized. – IRTFMplyr::revalue
will let you collapse levels, so it is probably the incorrect use ofif
andelse
instead ofifelse
that is part of what is tripping you up. There is also no "all.others" = "other_level" argument forrevalue
. – IRTFMif
andelse
take arguments of exactly length 1. They are program control functions and do not operate as persons might expect when they are prior users of SAS or SPSS where the data steps all have implicit column actions. – IRTFMlength.out
; that's just setting up an example data set (since you didn't provide one) with length 10. – Aaron left Stack Overflow