0
votes

I am pretty new to R. I Was trying hands-on the titanic dataset (available online). I was running a code to impute the missing values in Age column. But I was getting an error - Error in if (class[i] == 1) { : missing value where TRUE/FALSE needed. Need some help on how to do away with the error. Below is the code used:

impute_Age <- function(Age, class){
  vector <- Age
  for(i in 1:length(Age)){
    if (is.na(Age[i])){
      if(class[i] == 1){
        vector[i] <- round(mean(filter(titanic, titanic$ï..pclass==1)$age, na.rm=TRUE),0)
       }else if (class[i] == 2){
        vector[i] <- round(mean(filter(titanic, titanic$ï..pclass==2)$age, na.rm=TRUE),0)
      }else{
        vector[i] <- round(mean(filter(titanic, titanic$ï..pclass==3)$age, na.rm=TRUE),0)
      }
    }else{
      vector[i]<-Age[i]
    }
  }
  return(vector)
}

imputed_Age <- impute_Age(titanic$age, titanic$ï..pclass)
titanic$age <- imputed_Age
2
which library are you using to get the Titanic dataset? are you using library("titanic") data(Titanic)? If you are, there is no missing Age in that dataset! the levels of Class are also "Child" and "Adult" so that will throw an error when being compared to 1 or 2 in class[i] == 1 or class[i] == 2. - Shirin Yavari
No I am importing the titanic csv file. There are missing values for Age across all classes. I ran the below code to find the missing values: > colSums(is.na(titanic)|titanic=='') which shows 264 missing values of Age - Sarang123
can you please share your dataset using dput? - Shirin Yavari
when you pass class to your function it is passed as a factor (or character) and when you compare it to a number (1, 2, or 3), it leads to the generated error. You can change add this line as the first line in your function class<- as.numeric(as.character(class)) , run your function again and test it! - Shirin Yavari
Thanks for your help. I tried to do the above, but still getting the same error. This is the dataset that I am using: biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.xls - Sarang123

2 Answers

0
votes

you can try this:

for (i in 1:3){
   titanic[which(is.na(titanic$age) & titanic$pclass==i),"age"] <-
   round(mean(titanic[which(titanic$pclass==i),"age"],na.rm=TRUE),digits=0)
}
0
votes

If you'd like to get away from for-loops you can do this with a nested if-else.

titanic$age <- {
 age1 = round(mean(titanic$age[titanic$pclass == 1], na.rm = TRUE))
 age2 = round(mean(titanic$age[titanic$pclass == 2], na.rm = TRUE))
 age3 = round(mean(titanic$age[titanic$pclass == 3], na.rm = TRUE))
 ifelse(is.na(titanic$age) & titanic$pclass == 1, age1,
    ifelse(is.na(titanic$age) & titanic$pclass == 2, age2,
           ifelse(is.na(titanic$age) & titanic$pclass == 3, age3, titanic$age)))
 }