0
votes

I am trying to bulk convert numeric and character class variables to factor in R. I feel that this should be simple, but I am running into an issue actually getting it to factor

What I have done is here:

>sapply(df, class)
    a           b          c
"numeric"   "numeric"  "numeric" 

>col.names <- c("a", "b", "c")
>df[,col.names] <- sapply(df[,col.names], as.factor)

and what I get back it this:

>sapply(df, class)
     a            b            c
"character"  "character"  "character"

And I am trying to figure out why it wouldn't convert from numeric to factor, and instead is going to character.

Typically the problem is going from factor to numeric, so I haven't been able to find anything about this type of issue.

4
Try exactly the same but with lapply. - Rui Barradas
This worked! Would you be able to tell me why lapply worked and sapply didnt? is it because sapply is simplified over a vector? - Tim Feeney
Ivan's answer explains the difference. - Gregor Thomas
Yes, it is. And the coercion is to the least common denominator, character. - Rui Barradas

4 Answers

1
votes

I suppose, in this case, sapply() returns you a matrix, which can not be a factor. Use

df[,col.names] <- lapply(df[,col.names], as.factor)

as lapply() returns list of factors to insert them into your df.

1
votes

You have to use lapply. In the following example, I create two equal df's with 4 columns.

df <- df2 <- data.frame(a = 1:5, b = 6:10, c = 11:15, d = 16:20)
col.names <- c("a", "b", "c")

df[,col.names] <- lapply(df[,col.names], as.factor)
sapply(df, class)
#a         b         c         d 
#"factor"  "factor"  "factor" "integer"

Note that if you want to change the entire data.frame, you need to write the square brackets, df2[].

df2[] <- lapply(df2, as.factor)
sapply(df2, class)
#a        b        c        d 
#"factor" "factor" "factor" "factor"
0
votes

Try something like:

df <- data.frame(sapply(df, as.factor))

The difference is enclosing it in a data.frame at the end.

0
votes

Here's a tidyverse solution.

library(tidyverse)

data <- tibble(x = c("blue", "green", y = c(1:2))

data <- data %>%
  mutate(x = factor(x),
         y = factor(y))