174
votes

I would like to change the format (class) of some columns of my data.frame object (mydf) from charactor to factor.

I don't want to do this when I'm reading the text file by read.table() function.

Any help would be appreciated.

9
mydf$myfavoritecolumn <- as.factor(mydf$myfavoritecolumn)tim riffe
Thanks! but I have another problem. I have the name of each column in an array of characters col_names[]. How can I use the above command (mydf$col_names[i]) doesn't work.Rasoul
Any way to do this automatically for all character variables, as data.frame does it with stringsAsFactors?Etienne Low-Décarie
@EtienneLow-Décarie: just unclass and use data.frame on the result,.IRTFM

9 Answers

219
votes

Hi welcome to the world of R.

mtcars  #look at this built in data set
str(mtcars) #allows you to see the classes of the variables (all numeric)

#one approach it to index with the $ sign and the as.factor function
mtcars$am <- as.factor(mtcars$am)
#another approach
mtcars[, 'cyl'] <- as.factor(mtcars[, 'cyl'])
str(mtcars)  # now look at the classes

This also works for character, dates, integers and other classes

Since you're new to R I'd suggest you have a look at these two websites:

R reference manuals: http://cran.r-project.org/manuals.html

R Reference card: http://cran.r-project.org/doc/contrib/Short-refcard.pdf

86
votes
# To do it for all names
df[] <- lapply( df, factor) # the "[]" keeps the dataframe structure
 col_names <- names(df)
# to do it for some names in a vector named 'col_names'
df[col_names] <- lapply(df[col_names] , factor)

Explanation. All dataframes are lists and the results of [ used with multiple valued arguments are likewise lists, so looping over lists is the task of lapply. The above assignment will create a set of lists that the function data.frame.[<- should successfully stick back into into the dataframe, df

Another strategy would be to convert only those columns where the number of unique items is less than some criterion, let's say fewer than the log of the number of rows as an example:

cols.to.factor <- sapply( df, function(col) length(unique(col)) < log10(length(col)) )
df[ cols.to.factor] <- lapply(df[ cols.to.factor] , factor)
32
votes

You could use dplyr::mutate_if() to convert all character columns or dplyr::mutate_at() for select named character columns to factors:

library(dplyr)

# all character columns to factor:
df <- mutate_if(df, is.character, as.factor)

# select character columns 'char1', 'char2', etc. to factor:
df <- mutate_at(df, vars(char1, char2), as.factor)
18
votes

If you want to change all character variables in your data.frame to factors after you've already loaded your data, you can do it like this, to a data.frame called dat:

character_vars <- lapply(dat, class) == "character"
dat[, character_vars] <- lapply(dat[, character_vars], as.factor)

This creates a vector identifying which columns are of class character, then applies as.factor to those columns.

Sample data:

dat <- data.frame(var1 = c("a", "b"),
                  var2 = c("hi", "low"),
                  var3 = c(0, 0.1),
                  stringsAsFactors = FALSE
                  )
14
votes

Another short way you could use is a pipe (%<>%) from the magrittr package. It converts the character column mycolumn to a factor.

library(magrittr)

mydf$mycolumn %<>% factor
5
votes

I've doing it with a function. In this case I will only transform character variables to factor:

for (i in 1:ncol(data)){
    if(is.character(data[,i])){
        data[,i]=factor(data[,i])
    }
}
2
votes

You can use across with new dplyr 1.0.0

library(dplyr)

df <- mtcars 
#To turn 1 column to factor
df <- df %>% mutate(cyl = factor(cyl))

#Turn columns to factor based on their type. 
df <- df %>% mutate(across(where(is.character), factor))

#Based on the position
df <- df %>% mutate(across(c(2, 4), factor))

#Change specific columns by their name
df <- df %>% mutate(across(c(cyl, am), factor))
0
votes

Unless you need to identify the columns automatically, I found this to be the simplest solution:

df$name <- as.factor(df$name)

This makes column name in dataframe df a factor.

0
votes

We can also use modify_if function from purrr. It will take a predicate function .p and apply it on every element of our data set and apply the function .f where the predicate results in a single TRUE.

  • I used modify_if as it preserves the input type and returns an output of the same type
  • Another variation is map_if
starwars %>% modify_if(~ is.character(.x), ~ factor(.x))

# A tibble: 87 x 14
   name   height  mass hair_color skin_color eye_color birth_year sex   gender homeworld species
   <fct>   <int> <dbl> <fct>      <fct>      <fct>          <dbl> <fct> <fct>  <fct>     <fct>  
 1 Luke ~    172    77 blond      fair       blue            19   male  mascu~ Tatooine  Human  
 2 C-3PO     167    75 NA         gold       yellow         112   none  mascu~ Tatooine  Droid  
 3 R2-D2      96    32 NA         white, bl~ red             33   none  mascu~ Naboo     Droid  
 4 Darth~    202   136 none       white      yellow          41.9 male  mascu~ Tatooine  Human  
 5 Leia ~    150    49 brown      light      brown           19   fema~ femin~ Alderaan  Human  
 6 Owen ~    178   120 brown, gr~ light      blue            52   male  mascu~ Tatooine  Human  
 7 Beru ~    165    75 brown      light      blue            47   fema~ femin~ Tatooine  Human  
 8 R5-D4      97    32 NA         white, red red             NA   none  mascu~ Tatooine  Droid  
 9 Biggs~    183    84 black      light      brown           24   male  mascu~ Tatooine  Human  
10 Obi-W~    182    77 auburn, w~ fair       blue-gray       57   male  mascu~ Stewjon   Human  
# ... with 77 more rows, and 3 more variables: films <list>, vehicles <list>, starships <list>