I am having trouble extracting caret's finalModel-parameters for nnet. If I use the - in my mind - exactly same parameters for caret::train and nnet::nnet, I get (sometimes) big differences. Have I forgotten a parameter or is this due to the computation-algorithm of the neural network? I am aware that I can use predict for caret_net (in the example below), but I still would like to reproduce the results only with nnet.
Example:
library(nnet)
library(caret)
len <- 100
set.seed(4321)
X <- data.frame(x1 = rnorm(len, 40, 25), x2 = rnorm(len, 70, 4), x3 = rnorm(len, 1.6, 0.3))
y <- 20000 + X$x1 * 3 - X$x1*X$x2 * 4 - (X$x3**4) * 7 + rnorm(len, 0, 4)
XY <- cbind(X, y)
# pre-processing
preProcPrms <- preProcess(XY, method = c("center", "scale"))
XY_pre <- predict(preProcPrms, XY)
# caret-nnet
controlList <- trainControl(method = "cv", number = 5)
tuneMatrix <- expand.grid(size = c(1, 2), decay = c(0, 0.1))
caret_net <- train(x = XY_pre[ , colnames(XY_pre) != "y"],
y = XY_pre[ , colnames(XY_pre) == "y"],
method = "nnet",
linout = TRUE,
TRACE = FALSE,
maxit = 100,
tuneGrid = tuneMatrix,
trControl = controlList)
# nnet-nnet
nnet_net <- nnet(x = XY_pre[ , colnames(XY_pre) != "y"],
y = XY_pre[ , colnames(XY_pre) == "y"],
linout = caret_net$finalModel$param$linout,
TRACE = caret_net$finalModel$param$TRACE,
size = caret_net$bestTune$size,
decay = caret_net$bestTune$decay,
entropy = caret_net$finalModel$entropy,
maxit = 100)
# print
print(caret_net$finalModel)
print(nnet_net)
y_caret <- predict(caret_net$finalModel, XY_pre[ , colnames(XY_pre) != "y"])
y_nnet <- predict(nnet_net, XY_pre[ , colnames(XY_pre) != "y"])
plot(y_caret, y_nnet, main = "Hard to spot, but y_caret <> y_nnet - which prm have I forgotten?")
hist(y_caret - y_nnet)
Thx & kind regards
caretand baseline packages comes up quite often. Many times, the discrepancy can be traced to random number generation. Neural net training typically starts from a random state. Since I don't seeset.seedanywhere in your code, it's reasonable to expect thatcaret::trainandnnet::nnetstart from two different states. Consequently, they likely converge to two different local optima. - Artem Sokolovnnetmodel with different seeds results in variation that is comparable to carets model. Just setting the same seed prior to both models will not accomplish much since there are different operations involved in caret and nnet. - missusecaret::trainand right beforennet::nnet. Sincecaretwill only consider a single set of meta-parameters, it should produce a single model that matches your ownnnetcall. - Artem Sokolov