Overview
I am classifying documents using random forest implementation in ranger R.
Now I am facing an issue, System expecting all the feature that are in Train set to be present in real time data set which is not possible to achieve, hence I am not able to predict for real time data text.
Procedure following
Aim : To predict description belongs to which type of class (i.e, OutputClass)
Each of the information like description, features are converted into Document term matrix
Document term matrix of Train Set
rpm Velocity Speed OutputClass
doc1 1 0 1 fan
doc2 1 1 1 fan
doc3 1 0 1 referigirator
doc4 1 1 1 washing machine
doc5 1 1 1 washing machine
Now train the model using the above matrix
fit <- ranger(trainingColumnNames,data=trainset)
save(fit,file="C:/TrainedObject.rda”)
Now I am using the above stored object to predict the real time description for their class type.
Load("C:/TrainedObject.rda”)
Again construct the Document matrix for the RealTimeData.
Velocity Speed OutputClass
doc5 0 1 fan
doc6 1 1 fan
doc7 0 1 referigirator
doc8 1 1 washing machine
doc9 1 1 washing machine
In real time data there is no term or feature by name “RPM”. So moment I call predict function
Predict(fit, RealTimeData)
it is showing an error saying RPM is missing,
which practically impossible to get all the term or feature of the train set in the real time data every time.
I tried in both the implementation of random forest in R (Ranger, RandomForest) with parameter in predict function like newdata Predict.all treetype.
None of the parameter helped to predict for the missing features in real time data.
someone please help me out how to solve the above issue
Thanks in advance