I have a training set that looks like
Name Day Area X Y Month Night
ATTACK Monday LA -122.41 37.78 8 0
VEHICLE Saturday CHICAGO -1.67 3.15 2 0
MOUSE Monday TAIPEI -12.5 3.1 9 1
Name
is the outcome/dependent variable.
Here is what my code looks like so far in case it helps
ynn <- model.matrix(~Name , data = trainDF)
mnn <- model.matrix(~ Day+Area +X + Y + Month + Night, data = trainDF)
yCat<-make.names(trainDF$Name, unique=FALSE, allow_=TRUE)
I then setup tuning the parameters
nnTrControl=trainControl(method = "repeatedcv",number = 3,repeats=5,verboseIter = TRUE, returnData = FALSE, returnResamp = "all", classProbs = TRUE, summaryFunction = multiClassSummary,allowParallel = TRUE)
nnGrid = expand.grid(.size=c(1,4,7),.decay=c(0,0.001,0.1))
model <- train(y=yCat, x=mnn, method='nnet',linout=TRUE, trace = FALSE, trControl = nnTrControl,metric="logLoss", tuneGrid=nnGrid)
When I ran this, it was still running over 20 hours later, so I had to stop it
I read in the link below that its possible to parallelize the resampling of Caret using registerDoMC
: R caret nnet package in Multicore
However, that only seems to work for cores. My machine uses 2 cores and 2 threads on each core. Is there a way to get a speedup using the threads in addition to using the 2 cores and registerDoMC(2)
?
I also see in this link below that the user had to setup seeds for each resample: Fully reproducible parallel models using caret Do I also have to do that for my code? Why was this not used in the former link? What about if I used xgboost instead of nnet?