Neural network architecture for q learning

Question

Question: What is the correct approach to getting the right architecture and hyperparameters for getting an appropriate neural network for a simple grid game? And how can it be scaled to make it work in a version of the game with a larger grid?

Context: Most tutorials and papers written about using neural networks in Q learning make use of convolutional neural networks to be able to handle the screen inputs from different games. But I am experimenting with a far simpler game with raw data:

Simple Matrix Game in which the possible moves for the agent are: up, down, right, left.

The notebook with the complete code can be found here: http://151.80.61.13/ql.html

All of the tested neural networks didn't achieve better than doing random moves. The reward went up to an average of 8.5 (out of 30 points) after ~1000 episodes and then started decreasing. Mostly eventually just spamming the same action for every move.

I know that for a small game as this a Q table would achieve better, but this is for learning to implement deep Q learning and after it working in a small example I want to scale it to a larger grid.

Current neural network (Keras) and solutions I have tried:

model = Sequential()
model.add(Dense(grid_size**2,input_shape=(grid_size, grid_size)))
model.add(Activation('relu'))
model.add(Dense(48))
model.add(Flatten())
model.add(Activation('linear'))
model.add(Dense(4))
adam = Adam(lr=0.1)
model.compile(optimizer=adam, loss='mse')
return model

Different hidden layer sizes: [512,256,100,48,32,24]
Varying number of hidden layers: [1,2,3]
Different learning rates: [3, 1, 0.8, 0.5, 0.3, 0.1, 0.01]
Testing variety of activation functions: [linear, sigmoid, softmax, relu]
Number of episodes and degree of epsilon decay
Trying with and without target network
Tried different networks from tutorials which were written voor OpenAI gym CartPole, FrozenLake and Flappy Bird.

The question is not clear for me. Do you want advice on how to implement a grid search process or do you want to know how to structure you nn for better results? — paolof89
The NN has to learn how to acquire most points in the 3x3 grid. The Q learning implementation seems to be working fine. So, what I am asking is how to structure the NN (for which I tried a large set of different structures) to make it a functional function approximator for the Q algorithm. — DonUber

Alex Mitrakow Alex Mitrakow · Accepted Answer · 2017-08-09T16:58:56

As in any machine learning task there is no perfect way to choose your hyperparams, but there are a few pieces of advice I can give you.

The number of neurons in each layer must be to small to fit your model and not to big to not overfit your model (also if the number of neurons is a power of two it can better parallel on your gpu. The only rule you should follow here: more complex game - more neurons
The same rules work for the number of layers in your net, but if you're training any type of recurrent net it is better to go deeper than having more neurons.
Your learning rate depends on your optimiser, but it is always better to have a smaller learning rate as the model converges better with a low learning rate (though it converges longer)
There also is no rule on choosing your activation functions, but you're training any kind of generative model you should use Leaky ReLU, Softplus or Elu

Neural network architecture for q learning

1 Answers