1
votes

I am working on a classification problem. Within my data processing, I estimate the best transformation to normality using bestNormalize(). During this step, I standardize all predictors.

I use PCA as a preprocessing step to decorrelate my data within the training step. I am not able to include the additional argument scale.=F to prevent that the data is standardized again. The caret documentation states "If PCA is requested but centering and scaling are not, the values will still be centered and scaled.". I am writing to ask whether standardizing twice can cause issues, and how I can use PCA within the preprocessing step without standardizing the predictors again.

The following did not work, I believe

my_ctrl <- trainControl(method='repeatedcv', 
                        number=5, 
                        repeats=5, 
                        search='grid',
                        preProcOptions = list(thresh=0.95,scale.=F),  #including scale argument via preProcOptions
                        classProbs = T,
                        summaryFunction = twoClassSummary,
                        savePredictions=T, 
                        index=createResample(y_train, 5))

lg <- train(y=y_train, x=X_train, method='glm', trControl=my_ctrl, preProcess='pca', metric="ROC", family="binomial")
2

2 Answers

0
votes

There are two popular ways for PCA: prcomp(x, scale = FALSE) and princomp(x, cor = FALSE, scores = TRUE)

Maybe you can try the one of these methodes, so you do not standardizing twice. Furthermore, it should not be a problem if you standarize twice, because the second time nothing should happen to your predictors, since they are already standardized.

Let me knwo if this helps :)

0
votes

Another solution you may consider is to tell bestNormalize not to standardize, via setting standardize = FALSE as an option. You wouldn't be double-standardizing, as standardization would occur in the later PCA step.