0
votes

Can someone please explain how as.numeric(levels(x))[x] exactly work? here x is a factor variable.(for example x<-as.factor(sample(1:5,20,replace=TRUE)) ) As much as i am able to understand is that first we are getting the levels of x (which will be character after that we are changing it to numeric. what is happening after that I am not able to get. I know this representation is same as as.numeric(as.character(x)).

2
Have you read the first answer here?De Novo
...then it's just using x values as positions to get the corresponding levels, in a numeric form. You can use as.numeric(levels(x))[c(1,1,2)] as an example, which means give me the 1st, 1st (again) and 2nd level. If you try to ask for something that doesn't exist it will return NA like this as.numeric(levels(x))[c(1,1,2,6)]AntoniosK
@DeNovo Yes I saw that post but I think It was regarding how to perform the conversion but not about how exactly it is happening.nand
@AntoniosK got it. Thank you.nand

2 Answers

2
votes

R factors are vectors of integers that serve as indices into the levels character vector. So the inner part of that expression is creating a character vector. The outer part is converting the set of values: "5", "2", "4" .... etc into numeric values.

> x<-as.factor(sample(1:5,20,replace=TRUE)) 

The storage class of factor objects is integer:

> dput (x)
structure(c(4L, 2L, 3L, 4L, 5L, 2L, 2L, 2L, 1L, 2L, 4L, 2L, 1L, 
5L, 5L, 4L, 1L, 5L, 1L, 5L), .Label = c("1", "2", "3", "4", "5"
), class = "factor")

The levels() function returns the .Label attribute of a factor, and when a factor is used as an index, it gets handled as an integer:

> levels(x)[x]
 [1] "4" "2" "3" "4" "5" "2" "2" "2" "1" "2" "4" "2" "1" "5" "5" "4" "1" "5" "1" "5"

This method of conversion or extractions is slightly faster than as.character(x), but as you have experienced, it may seem a bit cryptic if you haven't worked through what is happening "under the hood" (or "bonnet" if that's what it's called in your part of the Englrish speaking world.)

2
votes

I always confused with R's factors. Usually, I use a perfect idea from package Rfast, the function Rfast::ufactor. It represents a factor using its initial type.

Here is an exmple:

x <- rnorm(10)
fx<- Rfast::ufactor(x)
fx$levels # you can get the levels like this
fx$values # you can get the values like this

Fast and simple. Rfast::ufactor is much faster than R's but I will not post any benchmark cause it doens't fit to the question.