15
votes

I have a data frame named "insurance" with both numerical and factor variables. How can I select all factor variables so that I can check the levels of the categorical variables?

I tried sapply(insurance,class) to get the the classes of all variables. But then I can't make logical argument based on if class(var)="factor" as the variable names are also included in the result of sapply().

Thanks,

5

5 Answers

19
votes

Some data:

insurance <- data.frame(
  int   = 1:5,
  fact1 = letters[1:5],
  fact2 = factor(1:5),
  fact3 = LETTERS[3:7]
)

I would use sapply like you did, but combined with is.factor to return a logical vector:

is.fact <- sapply(insurance, is.factor)
#   int fact1 fact2 fact3 
# FALSE  TRUE  TRUE  TRUE

Then use [ to extract these columns:

factors.df <- insurance[, is.fact]
#   fact1 fact2 fact3
# 1     a     1     C
# 2     b     2     D
# 3     c     3     E
# 4     d     4     F
# 5     e     5     G

Finally, to get the levels, use lapply:

lapply(factors.df, levels)
# $fact1
# [1] "a" "b" "c" "d" "e"
# 
# $fact2
# [1] "1" "2" "3" "4" "5"
# 
# $fact3
# [1] "C" "D" "E" "F" "G"

You might also find str(insurance) interesting as a short summary.

2
votes

This (almost) appears the perfect time to use the seldom-used function rapply

rapply(insurance, class = "factor", f = levels, how = "list")

Or

Filter(Negate(is.null),rapply(insurance, class = "factor", f = levels, how = "list"))

To remove the NULL elements (that weren't factors)

Or simply

lapply(Filter(is.factor,insurance), levels))
2
votes
insurance %>% select_if(~class(.) == 'factor')
1
votes

I would suggest to use dplyr and purrr here. First select the factor columns and then use purrr::map to show the factor levels for each column.

library(tidyverse)

insurance %>%
  select(where(is.factor)) %>%
  map(levels)
-3
votes

using the data frame "insurance" from flodel to get all the factors in one go, you can use apply , like so:

apply(insurance,2,factor)

     int fact1 fact2 fact3
[1,] "1" "a"   "1"   "C"  
[2,] "2" "b"   "2"   "D"  
[3,] "3" "c"   "3"   "E"  
[4,] "4" "d"   "4"   "F"  
[5,] "5" "e"   "5"   "G"  

if you are interest only in the levels of one factor you can do the following:

factor(insurance$fact1)

[1] a b c d e
Levels: a b c d e