0
votes

I'm trying to use caret to fit a PLS model while optimizing the number of components 'ncomps':

library("caret")
set.seed(342)
train <- as.data.frame ( matrix( rnorm(1e4) , 100, 100 ) )

ctrl <- rfeControl(functions = caretFuncs,                                                      
                   method = "repeatedcv",
                   number=2, 
                   repeats=1,
                   verbose =TRUE
)

pls.fit.rfe <- rfe(V1 ~ .,
                   data = train,   
                   method = "pls",                    
                   sizes =  6,
                   tuneGrid = data.frame(ncomp = 7), 
                   rfeControl = ctrl
)

Error in { : task 1 failed - "final tuning parameters could not be determined" In addition: There were 50 or more warnings (use warnings() to see the first 50)

Invalid number of components, ncomp

Setting sizes to 6 fixes the problem. It makes sense that I get an error when min(sizes) < max(ncomp), but is there a way to vary ncomp depending on the number of features used in the RFE iteration, i.e. the sizes variable? I would simply like to optimize over a wide range of sizes and #components at the same time.

1

1 Answers

2
votes

Try using tuneLength = 7 instead of tuneGrid. The former is more flexible and will use an appropriate ncomp given the size of the data set:

> pls.fit.rfe  pls.fit.rfe

Recursive feature selection

Outer resampling method: Cross-Validated (2 fold, repeated 1 times) 

Resampling performance over subset size:

 Variables   RMSE Rsquared  RMSESD RsquaredSD Selected
         6 1.0229  0.01684 0.04192  0.0155092         
        99 0.9764  0.00746 0.01096  0.0008339        *

The top 5 variables (out of 99):

If you'd rather not do that, you can always write your own fit function too.

Max