0
votes

I am having trouble extracting caret's finalModel-parameters for nnet. If I use the - in my mind - exactly same parameters for caret::train and nnet::nnet, I get (sometimes) big differences. Have I forgotten a parameter or is this due to the computation-algorithm of the neural network? I am aware that I can use predict for caret_net (in the example below), but I still would like to reproduce the results only with nnet.

Example:

library(nnet)
library(caret)

len <- 100
set.seed(4321)
X <- data.frame(x1 = rnorm(len, 40, 25), x2 = rnorm(len, 70, 4), x3 = rnorm(len, 1.6, 0.3))
y <- 20000 + X$x1 * 3 - X$x1*X$x2 * 4 - (X$x3**4) * 7 + rnorm(len, 0, 4)
XY <- cbind(X, y)

# pre-processing
preProcPrms <- preProcess(XY, method = c("center", "scale"))
XY_pre <- predict(preProcPrms, XY)

# caret-nnet
controlList <- trainControl(method = "cv", number = 5)
tuneMatrix <- expand.grid(size = c(1, 2), decay = c(0, 0.1))

caret_net <- train(x = XY_pre[ , colnames(XY_pre) != "y"],
                   y = XY_pre[ , colnames(XY_pre) == "y"],
                   method = "nnet",
                   linout = TRUE,
                   TRACE = FALSE,
                   maxit = 100,
                   tuneGrid = tuneMatrix,
                   trControl = controlList)

# nnet-nnet
nnet_net <- nnet(x = XY_pre[ , colnames(XY_pre) != "y"],
                 y = XY_pre[ , colnames(XY_pre) == "y"],
                 linout = caret_net$finalModel$param$linout,
                 TRACE = caret_net$finalModel$param$TRACE,
                 size = caret_net$bestTune$size,
                 decay = caret_net$bestTune$decay,
                 entropy = caret_net$finalModel$entropy,
                 maxit = 100)

# print
print(caret_net$finalModel)
print(nnet_net)

y_caret <- predict(caret_net$finalModel, XY_pre[ , colnames(XY_pre) != "y"])
y_nnet <- predict(nnet_net, XY_pre[ , colnames(XY_pre) != "y"])

plot(y_caret, y_nnet, main = "Hard to spot, but y_caret <> y_nnet - which prm have I forgotten?")
hist(y_caret - y_nnet)

Thx & kind regards

1
The question about discrepancies between caret and baseline packages comes up quite often. Many times, the discrepancy can be traced to random number generation. Neural net training typically starts from a random state. Since I don't see set.seed anywhere in your code, it's reasonable to expect that caret::train and nnet::nnet start from two different states. Consequently, they likely converge to two different local optima. - Artem Sokolov
set.seed is in line 4. Sometimes the differences are rather large (couldn't construct a better example though, because a haven't found the cause). - r.user.05apr
I agree with @Artem Sokolov just running the nnet model with different seeds results in variation that is comparable to carets model. Just setting the same seed prior to both models will not accomplish much since there are different operations involved in caret and nnet. - missuse
Try the following experiment: (i) set the meta-parameter tuning matrix to contain a single set of parameters only; (ii) set the same random seed right before caret::train and right before nnet::nnet. Since caret will only consider a single set of meta-parameters, it should produce a single model that matches your own nnet call. - Artem Sokolov
@Artem Sokolov This indeed provides the same RMSE. - missuse

1 Answers

3
votes

As stated in the comments the discrepancy is caused by different seeds. To quote @Artem Sokolov: Neural net training typically starts from a random state. It's reasonable to expect that caret::train and nnet::nnet start from two different states. Consequently, they likely converge to two different local optima.

To get a reproducible model start from the same seed:

controlList <- trainControl(method = "none", seeds = 1)
tuneMatrix <- expand.grid(size = 2, decay = 0)

set.seed(1)
caret_net <- train(x = XY_pre[ , colnames(XY_pre) != "y"],
                   y = XY_pre[ , colnames(XY_pre) == "y"],
                   method = "nnet",
                   linout = TRUE,
                   TRACE = FALSE,
                   maxit = 100,
                   tuneGrid = tuneMatrix,
                   trControl = controlList)

set.seed(1)
nnet_net <- nnet(x = XY_pre[ , colnames(XY_pre) != "y"],
                 y = XY_pre[ , colnames(XY_pre) == "y"],
                 linout = caret_net$finalModel$param$linout,
                 TRACE = caret_net$finalModel$param$TRACE,
                 size = caret_net$bestTune$size,
                 decay = caret_net$bestTune$decay,
                 entropy = caret_net$finalModel$entropy,
                 maxit = 100)

y_caret <- predict(caret_net, XY_pre[ , colnames(XY_pre) != "y"])
y_nnet <- predict(nnet_net, XY_pre[ , colnames(XY_pre) != "y"])


all.equal(as.vector(y_caret[,1]), y_nnet[,1])
#TRUE

apart from setting the same seeds the key is to avoid re-sampling in caret since it depends on the seed and precedes the model training.