In Andrew Ng's lecture notes, they use LBFGS and get some hidden features. Can I use gradient descent instead and produce the same hidden features? All the other parameters are the same, just change the optimization algorithm.
Because When I use LBFGS, my autoencoder can produce the same hidden features as in the lectures notes, but when I use gradient descent, the features in the hidden layer are gone, seems like totally random.
To be specific, in order to optimize the cost function, I implement 1)the cost function, 2)gradient of each Weight and Bias. And throw them into scipy optimize tool box to optimize the cost function. And this setting can give me the reasonable hidden features.
But when I change to gradient descent. I tried to let the "Weight - gradient of the Weight" and "Bias - gradient of the Bias". But the resulted hidden features looks like totally random.
Can somebody help me to know the reason? Thanks.
they use LBFGS and get some hidden features. Can I use gradient descent instead and produce the same hidden features?- in principle yes. At least if both converge. Gradient descent however can be painfully slow for some functions, so you may not end up in a local optimum in reasonable amount of time. Also the choice of the step-size will be critical if you want to implement the optimization yourself. - cel