0
votes

I have a logical type column Self_Employed, values as TRUE and FALSE, It has missing values which means say "An Employee" not a self employed person. I would like to impute "Missing" category in the column

class(df$Self_Employed)
[1] "logical"

levels(df$Self_Employed)
NULL

sum(is.na(df$Self_Employed))
[1] 210

table(df$Self_Employed)
 FALSE   TRUE 
  1561    271

getting class as "logical", levels as NULL and sum of missing as 210, table shows total of true and false.

To impute missing First I convert to factor, then I impute missing, but not filling up, showing only NA and the levels only saying TRUE and FALSE

df$Self_Employed <- as.factor(df$Self_Employed)
levels(df$Self_Employed)[levels(df$Self_Employed)=="" ] <- "SE_Missing"

levels(df$Self_Employed)
[1] "FALSE" "TRUE" 

Levels showing only True and False and is.na shows same 210

df$Self_Employed <- factor(df$Self_Employed,levels=c('FALSE','TRUE',''),labels=c('Yes','No','SE_Missing'))

How to fill the missing factor

I need to convert True to "Yes", False to "No", NA to "SE_Missing"

1
Don't forget that true/false in R can act as 1/0. - DJV
You might try formating your column first, consider: levels(factor(format(c(TRUE, FALSE, NA)))) - MichaelChirico

1 Answers

3
votes

I don't think you need to turn the column to factors. Here is an example using a dummy dataset

library(dplyr)
df %>%
  mutate(b = case_when(b ~ "Yes", 
                       !b ~ "No", 
                       TRUE ~ "SE_Missing"))

#  a          b
#1 1        Yes
#2 2        Yes
#3 3         No
#4 4 SE_Missing
#5 5         No
#6 6 SE_Missing

Or using nested ifelse which can be integrated in mutate as well

with(df, ifelse(is.na(b), "SE_Missing", ifelse(b, "Yes", "No")))
#[1] "Yes"    "Yes"    "No"    "SE_Missing" "No"    "SE_Missing"

data

df <- data.frame(a = 1:6, b = c(TRUE, TRUE, FALSE, NA, FALSE, NA))

#  a     b
#1 1  TRUE
#2 2  TRUE
#3 3 FALSE
#4 4    NA
#5 5 FALSE
#6 6    NA