2
votes

I have a k nearest neighbors implementation that let me compute in a single pass predictions for multiple values of k and for multiple subset of training and test data (e.g. all the folds in the K-fold cross validation, AKA resampling metrics). My implementation can also leverage multiple cores.

I would like to interface my method to be used with the caret package. I can easily build custom method for the train function. But this will result in multiple calls to the model fit (one for each parameter and fold combinations).

As far as I know, I can't indicate tuning strategies when using trainControl. The code source of train mention something about "seq" model fitting :

## There are two types of methods to build the models: "basic" means that each tuning parameter
## combination requires it's own model fit and "seq" where a single model fit can be used to
## get predictions for multiple tuning parameters.

But I can't see any way to actually use that with custom models.

Any clue on how to approach this ?

More generally, suppose that you have a model class where you can estimate prediction errors across multiple parameters using a single model fit (e.g. ala Linear Regression LOOCV Trick but for multiple parameter values too), how would you interface it in caret?

Here's some example code to set up a (empty) custom model in caret:

# Custom caret
library(caret)
learning_data = data.frame(y=sample(c("one","two","three"),200,replace=T))
learning_data = cbind(learning_data,matrix(runif(3*200),ncol=3))
testRatio=0.75
inTrain <- createDataPartition(learning_data$y, p = testRatio, list = FALSE)
trainExpr <- learning_data[inTrain,]
testExpr <- learning_data[-inTrain,]

trainClass <- trainExpr$y
testClass <- testExpr$y

trainExpr$y<-NULL
testExpr$y<-NULL
cv_opts = trainControl(method="cv", number=4,verboseIter=T)

my_knn <- function(data,weight,parameter,levels,last,...){
        print("training")
        # print(dim(data))
        # str(parameter)
        # list(fit=rdist(data$,data))
        list(fit=NA)
}
my_knn_pred <- function(object,newdata){
    print("testing")
    # str(object)
    # print(dim(newdata))
    return("one")
}

sortFunc <- function(x)  x[order(x$k),]
# Values of K to test
knn_opts = data.frame(.k=c(seq(7,11, 2))) #odd to avoid ties
custom_tr = trainControl(method="cv", number=4,verboseIter=T,   custom=list(parameters=knn_opts,model=my_knn,prediction=my_knn_pred,probability=NULL,sort=sortFunc))

# This will result in 12 calls, 6 to my_knn, 6 to my_knn_pred, one per combination of fold and parameter value
custom_knn_performances <- train(x = trainExpr, y = trainClass,method = "custom",trControl=custom_tr,tuneGrid=knn_opts)

I would like to control the training procedure so as to generate predictions for all folds and parameter values in a single call.

1

1 Answers

4
votes

The current custom model fit parts of train don't allow for sequential parameters.

The next release will. All of the specific model code will no longer be hard-coded and will be modularized (including the sequential parameters).

The work is about 80% done and I hope to have it out before the end of the year. I want to do a lot of testing on this version.

Drop me an email if you would like to kick it around before it is released (no warranty though).

Max