I would like to change the format (class) of some columns of my data.frame object (mydf
) from charactor to factor.
I don't want to do this when I'm reading the text file by read.table()
function.
Any help would be appreciated.
Hi welcome to the world of R.
mtcars #look at this built in data set
str(mtcars) #allows you to see the classes of the variables (all numeric)
#one approach it to index with the $ sign and the as.factor function
mtcars$am <- as.factor(mtcars$am)
#another approach
mtcars[, 'cyl'] <- as.factor(mtcars[, 'cyl'])
str(mtcars) # now look at the classes
This also works for character, dates, integers and other classes
Since you're new to R I'd suggest you have a look at these two websites:
R reference manuals: http://cran.r-project.org/manuals.html
R Reference card: http://cran.r-project.org/doc/contrib/Short-refcard.pdf
# To do it for all names
df[] <- lapply( df, factor) # the "[]" keeps the dataframe structure
col_names <- names(df)
# to do it for some names in a vector named 'col_names'
df[col_names] <- lapply(df[col_names] , factor)
Explanation. All dataframes are lists and the results of [
used with multiple valued arguments are likewise lists, so looping over lists is the task of lapply
. The above assignment will create a set of lists that the function data.frame.[<-
should successfully stick back into into the dataframe, df
Another strategy would be to convert only those columns where the number of unique items is less than some criterion, let's say fewer than the log of the number of rows as an example:
cols.to.factor <- sapply( df, function(col) length(unique(col)) < log10(length(col)) )
df[ cols.to.factor] <- lapply(df[ cols.to.factor] , factor)
You could use dplyr::mutate_if()
to convert all character columns or dplyr::mutate_at()
for select named character columns to factors:
library(dplyr)
# all character columns to factor:
df <- mutate_if(df, is.character, as.factor)
# select character columns 'char1', 'char2', etc. to factor:
df <- mutate_at(df, vars(char1, char2), as.factor)
If you want to change all character variables in your data.frame to factors after you've already loaded your data, you can do it like this, to a data.frame called dat
:
character_vars <- lapply(dat, class) == "character"
dat[, character_vars] <- lapply(dat[, character_vars], as.factor)
This creates a vector identifying which columns are of class character
, then applies as.factor
to those columns.
Sample data:
dat <- data.frame(var1 = c("a", "b"),
var2 = c("hi", "low"),
var3 = c(0, 0.1),
stringsAsFactors = FALSE
)
You can use across
with new dplyr
1.0.0
library(dplyr)
df <- mtcars
#To turn 1 column to factor
df <- df %>% mutate(cyl = factor(cyl))
#Turn columns to factor based on their type.
df <- df %>% mutate(across(where(is.character), factor))
#Based on the position
df <- df %>% mutate(across(c(2, 4), factor))
#Change specific columns by their name
df <- df %>% mutate(across(c(cyl, am), factor))
We can also use modify_if
function from purrr
. It will take a predicate function .p
and apply it on every element of our data set and apply the function .f
where the predicate results in a single TRUE
.
modify_if
as it preserves the input type and returns an output of the same typemap_if
starwars %>% modify_if(~ is.character(.x), ~ factor(.x))
# A tibble: 87 x 14
name height mass hair_color skin_color eye_color birth_year sex gender homeworld species
<fct> <int> <dbl> <fct> <fct> <fct> <dbl> <fct> <fct> <fct> <fct>
1 Luke ~ 172 77 blond fair blue 19 male mascu~ Tatooine Human
2 C-3PO 167 75 NA gold yellow 112 none mascu~ Tatooine Droid
3 R2-D2 96 32 NA white, bl~ red 33 none mascu~ Naboo Droid
4 Darth~ 202 136 none white yellow 41.9 male mascu~ Tatooine Human
5 Leia ~ 150 49 brown light brown 19 fema~ femin~ Alderaan Human
6 Owen ~ 178 120 brown, gr~ light blue 52 male mascu~ Tatooine Human
7 Beru ~ 165 75 brown light blue 47 fema~ femin~ Tatooine Human
8 R5-D4 97 32 NA white, red red NA none mascu~ Tatooine Droid
9 Biggs~ 183 84 black light brown 24 male mascu~ Tatooine Human
10 Obi-W~ 182 77 auburn, w~ fair blue-gray 57 male mascu~ Stewjon Human
# ... with 77 more rows, and 3 more variables: films <list>, vehicles <list>, starships <list>
unclass
and usedata.frame
on the result,. – IRTFM