I have the following dataset:
> str(train)
'data.frame': 4619 obs. of 110 variables:
$ UserID : int 1 2 5 6 7 8 9 11 12 13 ...
$ YOB : int 1938 1985 1963 1997 1996 1991 1995 1983 1984 1997 ...
$ Gender : Factor w/ 3 levels "","Female","Male": 3 2 3 3 3 2 3 3 2 2 ...
$ Income : Factor w/ 7 levels "","$100,001 - $150,000",..: 1 3 6 5 4 7 5 2 4 6 ...
$ HouseholdStatus: Factor w/ 7 levels "","Domestic Partners (no kids)",..: 5 6 5 6 6 6 6 5 5 6 ...
$ EducationLevel : Factor w/ 8 levels "","Associate's Degree",..: 1 8 1 7 4 5 4 3 7 4 ...
$ Party : Factor w/ 6 levels "","Democrat",..: 3 2 1 6 1 1 6 3 6 2 ...
$ Happy : int 1 1 0 1 1 1 1 1 0 0 ...
$ Q124742 : Factor w/ 3 levels "","No","Yes": 2 1 2 1 2 3 1 2 2 1 ...
$ Q124122 : Factor w/ 3 levels "","No","Yes": 1 3 3 3 2 3 1 3 3 1 ...
$ Q123464 : Factor w/ 3 levels "","No","Yes": 2 2 2 3 2 2 1 2 2 1 ...
$ Q123621 : Factor w/ 3 levels "","No","Yes": 2 3 3 2 2 1 1 3 2 1 ...
$ Q122769 : Factor w/ 3 levels "","No","Yes": 2 2 2 1 3 1 1 2 2 2 ...
$ Q122770 : Factor w/ 3 levels "","No","Yes": 3 2 2 3 3 1 1 2 3 3 ...
$ Q122771 : Factor w/ 3 levels "","Private","Public": 3 3 2 2 3 3 1 3 3 3 ...
$ Q122120 : Factor w/ 3 levels "","No","Yes": 2 2 2 2 2 3 1 2 2 2 ...
$ Q121699 : Factor w/ 3 levels "","No","Yes": 3 3 3 2 2 3 2 3 3 2 ...
$ Q121700 : Factor w/ 3 levels "","No","Yes": 2 3 2 2 3 3 2 2 2 2 ...
$ Q120978 : Factor w/ 3 levels "","No","Yes": 1 3 2 3 3 2 2 3 3 3 ...
$ Q121011 : Factor w/ 3 levels "","No","Yes": 2 2 2 2 2 3 3 2 3 2 ...
$ Q120379 : Factor w/ 3 levels "","No","Yes": 2 3 3 2 3 3 2 2 2 3 ...
$ Q120650 : Factor w/ 3 levels "","No","Yes": 3 3 3 3 3 2 3 3 3 3 ...
$ Q120472 : Factor w/ 3 levels "","Art","Science": 1 3 3 3 3 2 3 3 2 3 ...
$ Q120194 : Factor w/ 3 levels "","Study first",..: 3 2 3 2 2 3 3 3 3 3 ...
$ Q120012 : Factor w/ 3 levels "","No","Yes": 2 3 3 1 2 3 2 2 3 3 ...
$ Q120014 : Factor w/ 3 levels "","No","Yes": 2 3 2 3 3 1 3 3 2 3 ...
$ Q119334 : Factor w/ 3 levels "","No","Yes": 1 3 2 2 2 3 2 3 2 2 ...
$ Q119851 : Factor w/ 3 levels "","No","Yes": 3 2 2 3 2 2 3 2 2 3 ...
$ Q119650 : Factor w/ 3 levels "","Giving","Receiving": 1 2 2 3 2 1 2 2 2 3 ...
$ Q118892 : Factor w/ 3 levels "","No","Yes": 3 3 3 2 3 2 1 3 2 2 ...
$ Q118117 : Factor w/ 3 levels "","No","Yes": 3 2 2 3 3 3 1 2 2 2 ...
$ Q118232 : Factor w/ 3 levels "","Idealist",..: 2 2 3 3 3 1 1 2 2 3 ...
$ Q118233 : Factor w/ 3 levels "","No","Yes": 2 2 2 2 2 2 1 2 3 2 ...
$ Q118237 : Factor w/ 3 levels "","No","Yes": 2 3 3 3 2 2 1 2 3 2 ...
$ Q117186 : Factor w/ 3 levels "","Cool headed",..: 1 2 2 2 1 3 1 2 3 1 ...
$ Q117193 : Factor w/ 3 levels "","Odd hours",..: 1 2 3 2 3 3 1 3 3 3 ...
$ Q116797 : Factor w/ 3 levels "","No","Yes": 3 3 2 2 2 1 1 2 2 1 ...
$ Q116881 : Factor w/ 3 levels "","Happy","Right": 2 2 3 3 2 2 1 2 2 1 ...
$ Q116953 : Factor w/ 3 levels "","No","Yes": 3 3 3 3 1 3 3 3 3 1 ...
$ Q116601 : Factor w/ 3 levels "","No","Yes": 3 3 3 2 3 3 1 3 3 1 ...
$ Q116441 : Factor w/ 3 levels "","No","Yes": 2 2 2 2 2 2 1 2 2 1 ...
$ Q116448 : Factor w/ 3 levels "","No","Yes": 2 3 3 3 2 2 1 2 3 1 ...
$ Q116197 : Factor w/ 3 levels "","A.M.","P.M.": 3 2 2 2 2 3 1 2 3 1 ...
$ Q115602 : Factor w/ 3 levels "","No","Yes": 3 3 3 3 3 2 1 3 2 1 ...
$ Q115777 : Factor w/ 3 levels "","End","Start": 3 2 3 3 3 3 1 3 2 1 ...
$ Q115610 : Factor w/ 3 levels "","No","Yes": 3 3 3 3 3 1 1 3 2 1 ...
$ Q115611 : Factor w/ 3 levels "","No","Yes": 2 2 3 3 2 2 1 2 2 1 ...
$ Q115899 : Factor w/ 3 levels "","Circumstances",..: 2 3 3 2 2 3 1 2 3 1 ...
$ Q115390 : Factor w/ 3 levels "","No","Yes": 3 2 2 2 1 2 3 3 2 1 ...
$ Q114961 : Factor w/ 3 levels "","No","Yes": 3 3 2 3 2 3 2 2 3 1 ...
$ Q114748 : Factor w/ 3 levels "","No","Yes": 3 2 2 2 3 3 3 2 3 1 ...
$ Q115195 : Factor w/ 3 levels "","No","Yes": 3 3 3 3 3 2 3 3 3 1 ...
$ Q114517 : Factor w/ 3 levels "","No","Yes": 2 3 2 3 2 2 2 2 3 1 ...
$ Q114386 : Factor w/ 3 levels "","Mysterious",..: 1 3 3 2 2 3 3 3 3 1 ...
$ Q113992 : Factor w/ 3 levels "","No","Yes": 3 1 3 2 2 2 2 2 3 1 ...
$ Q114152 : Factor w/ 3 levels "","No","Yes": 3 2 2 2 3 2 2 2 2 1 ...
$ Q113583 : Factor w/ 3 levels "","Talk","Tunes": 2 3 2 3 3 3 3 2 3 1 ...
$ Q113584 : Factor w/ 3 levels "","People","Technology": 3 2 2 3 2 1 3 2 2 1 ...
$ Q113181 : Factor w/ 3 levels "","No","Yes": 2 3 3 3 2 3 3 2 2 1 ...
[list output truncated]
As you can see, I have 110 variables. I am trying to build a predictive model to predict happiness using these variables. If I leave them in factor form (CART models, randomForest etc. struggle) so I'm trying to convert these into vectorised or numeric type (to make the algorithm's life a bit easier)...
Currently I am doing it one by one e.g.:
> table(train_new$Q117193)
Odd hours Standard hours
1410 1299 1910
> train_new$Q117193 = as.integer(train_new$Q117193)
> table(train_new$Q117193)
1 2 3
1410 1299 1910
You can notice that almost all the factor variables have missing values denoted by "". I have converted this dataset to numeric using:
train_numeric$Gender = as.integer(train_numeric$Gender)
train_numeric[,grep(pattern="^Q1",colnames(train_numeric))] = lapply(train_numeric[,grep(pattern="^Q1",colnames(train_numeric))],as.integer)
I am using the mice package to impute this dataset... I am lost to be honest. Any ideas how I could fill these missing values please?