0
votes

I need to produce some new factor variables in my dataset which contain information from existing factor variables.

In the first case I need to produce a binary NewVariable based on whether certain values occur in a specific variable which has more than 100 levels. I use the revalue() from the plyr package Namely,

NewVar <- if(OldVar1=="helen" | OldVar1=="greg") 
             {NewVar <-revalue(OldVar1, c("helen"="participant", "greg"="participant"))}
          else {NewVar=="nonparticipant"}

I actually want to collapse specific levels into a specific level from the new variable. As you can imagine the above code does not work but I cannot figure out why.

In the second case I need to combine information from three existing factor variables (OldVar1, OldVar2, OldVar3) in order to fill in the levels of a multi-categorical NewVariable, I run this code,

NewVariable="OptionA" <- if(OldVar1=="a" & OldVar2=="b" & OldVar3=="c")

I get an error "Error: unexpected '=' in "OldVar=" the same occurs when I remove one of the = in the OldVar1=="a"

Is it possible to create a factor NewVariable with its levels and labels without filling them with the string values in advance? I was not able to find something on that, the tutorials I see have produced their data and they just have to label the existing values.

Also, I would like to give values to the rest of my cases who either belong to OptionA, OptionB, OptionC, etc, will this be possible setting a different if-statement for each one of them as the following?

NewVariable="OptionA" <- if(OldVar1=="a" & OldVar2=="b" & OldVar3=="c")
NewVariable="OptionB" <- if(OldVar1=="a" & OldVar2=="d" & OldVar3=="e")

=== EDIT ===

For the second "challenge" I followed the code suggested by DWin I produced an interaction of my three variables that I have in the if(...) above and set inside c() only the values that I needed, for example

OldVar.ALL.interactions <- with(data, interaction(OldVar1, OldVar2, OldVar3)
levels(OldVar.ALL.interactions) # search for the levels that we need to include 
# in the NewVar
# below I follow DWin's code
NewVar <- factor(rep(NA, length(AnotherVarOfTheDataset) ),
                     levels=c("OptionA", "OptionB", ...))
NewVar[OldVar.ALL.interactions %in% c("...interaction.of.Old.Variables...")] <- "OptionA"
# the same as in OptionA for the rest of the levels
# the ** NewVar[ is.na(NewVar) ]  <- "nonparticipant" ** of DWin's code is not needed 

Is there any other way to solve this issue without using the interaction between the Old factor variables?

1
You cannot collapse levels easily anymore by manipulating the levels attribute. I started to do something like levels(NewVar) <- gsub("greg|helen" ...) and realized that would fail. You also cannot use: else {NewVar=="nonparticipant"} if you wanted to do an assignment. Then there is the whole problem that if and else are not vectorized.IRTFM
It does appear that plyr::revalue will let you collapse levels, so it is probably the incorrect use of if and else instead of ifelse that is part of what is tripping you up. There is also no "all.others" = "other_level" argument for revalue.IRTFM
By "not vectorized" you mean that they will not run all the length of the vector of the dataset? Is this why noah's suggestion includes an argument length.out=10 ?Pulse
Right. if and else take arguments of exactly length 1. They are program control functions and do not operate as persons might expect when they are prior users of SAS or SPSS where the data steps all have implicit column actions.IRTFM
Your explanation of "not vectorized" is correct, but that doesn't have anything to do with length.out; that's just setting up an example data set (since you didn't provide one) with length 10.Aaron left Stack Overflow

1 Answers

2
votes

I'd probably start out with an empty factor variable (assuming that you wanted to have a factor as was implied by the subject line):

NewVar <- factor(rep(NA, length(OldVar) ), 
                 levels=c("participant", "nonparticipant") )   
NewVar[ OldVar %in% c("a", "b", "c")] <- "participant"
NewVar[ is.na(NewVar) ]             <- "nonparticipant"

If you don't mind having a character vector than somethingalong these lines:

 y <- vector("character",length(x))
 y[ x %in% c("a","c")] <- "p"
 y[ !x %in% c("a","c")] <- "np"
 y
#[1] "p" "np"  "p"