9
votes

I have a programming doubt in R and I have no idea how to solve it after spending hours looking at potential responses on the internet and on Stack Overflow.

I have a factor variable in a column of a data.frame that looks like this:

Columnname
agsgssg
agsgssg
agsgssg
adgatata
ahagha
ahagha
ahagha
ahagha
aghaatah
ghssghs
ghssghs
ghssghs

The factor variable is not directly transformable into numeric with as.numeric(as.character()) because each level is a string, not a number.

What I would need is

Columnname            Numericcolumnname
agsgssg                        1
agsgssg                        1
agsgssg                        1
adgatata                       2
ahagha                         3   
ahagha                         3  
ahagha                         3   
ahagha                         3  
aghaatah                       4  
ghssghs                        5
ghssghs                        5   
ghssghs                        5  

I have tried several approaches including using levels() for the factor variable, using freq() for the factor variable trying to figure out how many rows there are for each level and then making a repeated number for each level of the factor with several "for" loops without success.

I feel that it should have a very simple solution, I am just not figuring it out.

Thank you for your consideration

1
from the example df$Numericcolumnname <- as.numeric(Columnname) - Pierre L
match(df$Columnname, unique(df$Columnname))? - talat
@PierreLafortune Your solution will not work if the levels are in different order - akrun
The user may not be looking for a particular order. It is not mentioned or hinted at. The intuition appears to be the underlying numeric equivalent of the factor variable as.numeric(x). - Pierre L

1 Answers

13
votes

In case, the levels are in different order, we can convert the column to factor with levels specified as the unique elements in that column, and then coerce it to numeric/integer.

df1$Numericcolumnname <- as.numeric(factor(df1$Columnname, 
                  levels=unique(df1$Columnname)))