I have a dataset with 20 variables, and quite a bit of missing data. I am trying to add a new variable with a value assigned for each row, based on those of another variable. Below is code and a smaller dataset that gives the same errors as my larger dataset. Any suggestions?
A=seq(1,6); B=seq(2,4)
length(A)=7; length(B)=7
m=cbind(A,B)
I do not understand completely what converting from a matrix to a dataframe does.
df=as.data.frame(m)
df
First trying to create a categorical variable,to use when assigning the value of the new variable
df$Acat=cut(df$A,
breaks=c(-Inf,2.5,4.5,Inf),
labels=c("low","mod","hi"))
df$Acat
This code below is where I get an error ": argument is of length zero"
if (df$Acat.=="low"){
df$C=1
}else if (df$Acat.=="mod"){
df$C=2
}else if(df$Acat.=="hi"){
df$C=3
}else {
df$C=NA
}
df$C
I also tried it this way, using the numeric variable for assigning the value of the new variable but I am getting this error:
the condition has length > 1 and only the first element will be used
if (df$A<2.5){
df$D=1
} else if (df$A>=2.5 && df$A<4.5){
df$D=2
} else if (df$A>=4.5){
df$D=3
} else {
df$D=NA
}
df$D
df$C <- match(df$Acat, c("low","mod","hi"))
– GKidf$D <- findInterval(df$A, c(-Inf,2.5,4.5,Inf))
– GKi