
I have a training dataset and testing dataset each with approximately 1300 and 400 samples, respectively. I run a grid search, which creates x number of deep networks (using softmax for the output, RELU for the hidden layers, and gradient descent) with varying numbers of hidden nodes in a prespecified number of hidden layers. For example, if I say check all single layer models, the grid search will, for this example, create 100 deep networks with 1, 2, 3...100 hidden nodes in the single layer. For every model and for every epoch, the grid search will train the model and test it by feeding the model random batches of the training/testing data using a prespecified batch size. The program then spits out an AUC value after each epoch of training for all 100 models. Thus, we get 100 output files with all the AUC values after every epoch of training. I can then go through these files with a parser to see what the optimal model is and what the optimal number of epochs is.

However, when I run my grid search, I noticed that the best models in the first run are not the same as those in subsequent runs. I attribute this to the random batches fed into the model for training and testing but then how can I actually find the "optimal model"?

There is also the effect of random initialization of the weights. There might not be one optimal model but multiple ones.Dr. Snoopy
I guarantee you, that none of these runs return the optimal model but just different locally-optimal / sub-optimal solutions. Because of randomness and heuristic-driven learning (in this context = non-convex optimization problem), there will be always this variance. You could increase cross-validation folds to lower the variance.sascha
Great! Thank you for the comments. Right now I initialize the weights as a constant value of 1/sqrt(number of nodes in the previous layer) from run to run. @sascha thank you for the comment about cv folds, I'll definitely look into that. Do you recommend any other optimization techniques that might help my model become as robust as possible?g00glechr0me

1 Answers


I think 'optimal' is quite subjective, you might find a particular model to be perform better when predicting a particular dataset whilst another model performs slightly better on another. A good measure ( in my opinion) will be the Mean Root Square Error.

Certain AI packages can be set to eliminate randomness, eg H2O thus you can replicate the same results if you put in the same 'seed' next time.

Hope this helps you out