R Machine Learning Model - Blind Test

Question

I am working in a model for a competition, we were provided with 2 datasets:

Dataset A: Does contain the label, to be used to train/test the model. Dataset B: Does not contain the label, this data is to be used in a blind test, and based in the predictions a score is assigned.

I am ready with the model, however when using the function predict() with the Dataset B (for the blind test) one question came up, Do I have to apply the same pre-processing steps (remove duplicates, NAs, Scale Numeric Features) applied in the Dataset A? And what about the NAs? Looking in the Dataset B several NAs were included.

Thanks in advance for your help.

Yes, I think you should apply the same pre-processing steps. As for those NA values, if a given column have only a handful of NA, one quick fix would be to just replace them with the columns mean or median. — Tim Biegeleisen

Rafael Díaz Rafael Díaz · Accepted Answer · 2017-10-04T05:52:14

I think I would have to apply the same pre-processing applied to data set A, duplicates, remove NA, Scale Numeric Features. For predictions could be affected. Dame puntos amigo.

R Machine Learning Model - Blind Test

2 Answers