0
votes

i built a predictive model classifying persons into two categories having <=50k and >50k income

but as i open my file in excel or r to see final predictions i see in place of my values only the levels ( 1 and 2 ) which i assigned in first place to simplify process

please tell me how to retain my original values represented by levels rather than the levels

Here is the outline i followed

this is my target variable income.group here it is the initial state

str (train_gbW7HTd $Income.Group)

chr [1:32561] "<=50K" "<=50K" "<=50K" "<=50K" "<=50K" "<=50K" "<=50K" ...

now to apply decision trees i encoded my target variable into levels 1 and 2 i used the following code train$Income.Group <- match(train$Income.Group,unique(train$Income.Group)) i got

table(train$Income.Group)

1     2 

24720 7841

i build decision tree like this set.seed(333)

fit <- rpart (Income.Group ~.,data = train, method = "class", control = rpart. control( minsplit = 20, minbucket = 100, maxdepth = 10, xval = 5) + )

make predictions pred <- predict(fit,test,type = "class")

pred_train <- predict(fit,train,type = "class")

confusionMatrix (pred_train,train$Income.Group)

saved my file solution.frame <- data.frame(ID = test$ID, Income.Group = pred)

write.csv(solution.frame,file = "final_solution.csv")

but my final csv file has levels 1 and 2 representing final predictions and not <=50k and >50k which i actually want. please tell me how to proceed . thanks in advance i already used solution.frame$Income.Group <- ifelse(solution.frame$Income.Group =="1","<=50k",">50k")

but it gave single value >50k to entire column of Income.Group

Please tell me what to do as i m stuck at this step and unable to complete my model submission.

1

1 Answers

0
votes

You could use ifelse:

train$Income.Group<-ifelse(train$Income.Group=="1","<=50K",">50K")