Question: What is the correct approach to getting the right architecture and hyperparameters for getting an appropriate neural network for a simple grid game? And how can it be scaled to make it work in a version of the game with a larger grid?
Context: Most tutorials and papers written about using neural networks in Q learning make use of convolutional neural networks to be able to handle the screen inputs from different games. But I am experimenting with a far simpler game with raw data:
Simple Matrix Game in which the possible moves for the agent are: up, down, right, left.
The notebook with the complete code can be found here: http://151.80.61.13/ql.html
All of the tested neural networks didn't achieve better than doing random moves. The reward went up to an average of 8.5 (out of 30 points) after ~1000 episodes and then started decreasing. Mostly eventually just spamming the same action for every move.
I know that for a small game as this a Q table would achieve better, but this is for learning to implement deep Q learning and after it working in a small example I want to scale it to a larger grid.
Current neural network (Keras) and solutions I have tried:
model = Sequential()
model.add(Dense(grid_size**2,input_shape=(grid_size, grid_size)))
model.add(Activation('relu'))
model.add(Dense(48))
model.add(Flatten())
model.add(Activation('linear'))
model.add(Dense(4))
adam = Adam(lr=0.1)
model.compile(optimizer=adam, loss='mse')
return model
- Different hidden layer sizes: [512,256,100,48,32,24]
- Varying number of hidden layers: [1,2,3]
- Different learning rates: [3, 1, 0.8, 0.5, 0.3, 0.1, 0.01]
- Testing variety of activation functions: [linear, sigmoid, softmax, relu]
- Number of episodes and degree of epsilon decay
- Trying with and without target network
- Tried different networks from tutorials which were written voor OpenAI gym CartPole, FrozenLake and Flappy Bird.