Recoding a dataset with variables of different classes

Question

I'm trying to recode the variables in my datasets. The data is messy and consisted of mixed classes. And I want to tidy all of them into binary numeric variables with 1/0. I have produced a simplified example as follows:

My original data consist of variables of classes character (yes/no), logical (TRUE/FALSE) and numeric (1/0). I want to code everything into 1/0, and the missing values as 0 as well.

tmp <- data.frame(x1 = c("Yes","Yes","No","No",NA),
                  x2 = c(TRUE, TRUE, FALSE, FALSE, NA),
                  x3 = c(1,1,0,0,NA))
tmp$x1 <- as.character(tmp$x1)

recode.var <- function(x){
      if (is.character(x)) {
      x <- ifelse(x=="Yes",1,ifelse(x=="No",0,ifelse(is.na(x),0,NA)))
    } 
      if (is.logical(x)) {
      x <- ifelse(x==TRUE,1,ifelse(x==FALSE,0,ifelse(is.na(x),0,NA)))
    } 
      if (is.numeric(x)) {
      x <- ifelse(x==1,1,ifelse(x==0,0,ifelse(is.na(x),0,NA))) 
      }
  x <- as.numeric(x)
  return(x)
}
tmp1 <- data.frame(apply(tmp, 2, recode.var))

However, the result is not what I wished.

> tmp1
  x1 x2 x3
1  1 NA NA
2  1 NA NA
3  0 NA NA
4  0 NA NA
5 NA NA NA

Would appreciate if someone could spot the error in the code. Thanks.

Welcome to Stack Overflow! We don't quite have a reproducible example here yet. What was the code you used to generate tmp1? For me, as.data.frame(lapply(tmp, recode.var)) and dplyr::mutate_all(tmp, recode.var) give exactly what I think it is you're looking for. — duckmayr

A. Stam A. Stam · Accepted Answer · 2019-04-03T11:09:17

I've spotted at least one small problem with your custom function: if you're using ifelse, you need to start off with the is.na condition. See this example:

x <- c(1, 2, NA)
ifelse(x == 1, "foo", "bar")
# > [1] "foo" "bar" NA

Here's an alternative I've made. The coalesce function comes from the dplyr package.

recode.var <- function(x) {
  if (is.character(x)) {
    return(coalesce(as.numeric(x == "Yes"), 0))
  }

  if (is.numeric(x)) {
    return(coalesce(x, 0))
  }

  if (is.logical(x)) {
    return(coalesce(as.numeric(x), 0))
  }

  x
}

My version does not deal with values outside the options you've mentioned. I'm assuming they don't exist in your dataset, so they don't need to be accounted for, but do tell me if that's a problem.

The final step is how to apply the function to the dataframe. Using dplyr you can use the following:

tmp2 <- mutate_all(tmp, recode.var)

Recoding a dataset with variables of different classes

2 Answers