I am conducting knn
regression on my data, and would like to:
a) cross-validate through repeatedcv
to find an optimal k
;
b) when building knn model, using PCA
at 90%
level threshold to reduce dimensionality.
library(caret)
library(dplyr)
set.seed(0)
data = cbind(rnorm(20, 100, 10), matrix(rnorm(400, 10, 5), ncol = 20)) %>%
data.frame()
colnames(data) = c('True', paste0('Day',1:20))
tr = data[1:15, ] #training set
tt = data[16:20,] #test set
train.control = trainControl(method = "repeatedcv", number = 5, repeats=3)
k = train(True ~ .,
method = "knn",
tuneGrid = expand.grid(k = 1:10),
#trying to find the optimal k from 1:10
trControl = train.control,
preProcess = c('scale','pca'),
metric = "RMSE",
data = tr)
My questions:
(1) I notice that someone suggested to change the pca parameter in trainControl:
ctrl <- trainControl(preProcOptions = list(thresh = 0.8))
mod <- train(Class ~ ., data = Sonar, method = "pls",
trControl = ctrl)
If I change the parameter in the trainControl, does it mean the PCA is still conducted during the KNN? Similar concern as this question
(2) I found another example which fits my situation - I am hoping to change the threshold to 90% but I don't know where can I change it in Caret
's train
function, especially I still need the scale
option.
I apologize for my tedious long description and random references. Thank you in advance!
(Thank you Camille for the suggestions to make the code work!)
caret
, but it looks likepreProcess
should be an argument totrain
, not a function. ChangepreProcess(c('scale','pca'))
topreProcess = c('scale','pca')
– camille