0
votes

I am trying to convert factor variables into numeric. I have tried both these solutions -

as.numeric(levels(f))[f] 

as.numeric(as.character(f))

But the issue persists. Warning Message - NAs introduced by coercion

Reproducible example -

df = data.frame(x = c("10: Already Delinquent 90+",
                      "11: Credit History <6 Months",
                      "12: Current Balance = 0",
                      "13: Balance (2-6)=0",
                      "20: 1+ x 90+",
                      "30: 3+ x 60-89",
                      "31: 2 x 60-89",
                      "32: 1 x 60-89",
                      "40: 3+ x 30-59",
                      "41: 2 x 30-59",
                      "42: 1 x 30-59",
                      "50: Insufficient Performance",
                      "60: 3+ x 1-29",
                      "61: 2 x 1-29",
                      "62: 1 x 1-29",
                      "70: Never delinquent"),
                y = c("00:Bad",
                      "01:Ind",
                      "02:Good",
                      "NA",
                      "00:Bad",
                      "01:Ind",
                      "02:Good",
                      "NA",
                      "00:Bad",
                      "01:Ind",
                      "02:Good",
                      "NA",
                      "00:Bad",
                      "01:Ind",
                      "02:Good",
                      "NA"),
                z = ceiling(rnorm(16)))

#Select all the factor variables
factorvars = colnames(df)[which(sapply(df,is.factor))]

#Concatenate with "_Num"
xxx <- paste(factorvars, "_Num", sep="")

#Converting Factor to Numeric
for (i in 1:length(factorvars))
df[,xxx[i]] = NA
df[,xxx[i]] = as.numeric(levels(df[,factorvars[i]]) [df[,factorvars[i]]])

I want to retain factor variables and create new variables with conversion of levels to numeric. The desired output looks like below -

x   y   x_num   y_num
10: Already Delinquent 90+  00:Bad  1   1
11: Credit History <6 Months    01:Ind  2   2
12: Current Balance = 0 02:Good 3   3
13: Balance (2-6)=0 NA  4   NA
20: 1+ x 90+    00:Bad  5   1
30: 3+ x 60-89  01:Ind  6   2
31: 2 x 60-89   02:Good 7   3
32: 1 x 60-89   NA  8   NA
40: 3+ x 30-59  00:Bad  9   1
41: 2 x 30-59   01:Ind  10  2
42: 1 x 30-59   02:Good 11  3
50: Insufficient Performance    NA  12  NA
60: 3+ x 1-29   00:Bad  13  1
61: 2 x 1-29    01:Ind  14  2
62: 1 x 1-29    02:Good 15  3
70: Never delinquent    NA  16  NA
1
There are no characters coercible to numerics in your data set. They are all attached to characters in some way, resulting in NA. i.e. you can't have "01:Ind" and expect it to convert to 1Rich Scriven
I am sorry to say i didn't get your comment. z is a numeric variable in df. Is there no way to convert "01:Ind" to 1 in a new variable?Ujjawal Bhandari
What's your expected output?Avinash Raj
Yes, but a different way. Please provide the desired resultRich Scriven
I have pasted my desired output. I apologize for bad formatting of the table.Ujjawal Bhandari

1 Answers

2
votes

Judging by your desired output, it doesn't look like you want to convert the factors to the numbers contained in their strings. Instead you want the internal representation of the factors.

Try this:

df[,xxx] <- lapply(df[,factorvars], as.numeric)
#                               x       y  z x_Num y_Num
# 1    10: Already Delinquent 90+  00:Bad  2     1     1
# 2  11: Credit History <6 Months  01:Ind  2     2     2
# 3       12: Current Balance = 0 02:Good  1     3     3
# 4           13: Balance (2-6)=0    <NA>  1     4    NA
# 5                  20: 1+ x 90+  00:Bad  0     5     1
# 6                30: 3+ x 60-89  01:Ind  0     6     2
# 7                 31: 2 x 60-89 02:Good  0     7     3
# 8                 32: 1 x 60-89    <NA>  0     8    NA
# 9                40: 3+ x 30-59  00:Bad  2     9     1
# 10                41: 2 x 30-59  01:Ind  0    10     2
# 11                42: 1 x 30-59 02:Good  0    11     3
# 12 50: Insufficient Performance    <NA>  1    12    NA
# 13                60: 3+ x 1-29  00:Bad  1    13     1
# 14                 61: 2 x 1-29  01:Ind -1    14     2
# 15                 62: 1 x 1-29 02:Good -1    15     3
# 16         70: Never delinquent    <NA> -1    16    NA

Data

I cleaned your example data by changing the character string "NA" to actual NA values:

is.na(df$y) <- df$y == "NA"
df$y <- droplevels(df$y)