I write neural network that can play tic-tac-toe. Network has 9 input neurons, which describe the state of the board (1 - for network moves, 1.5 - for opponent moves, 0 - for empty cells) and 9 output neurons (output neuron with the highest value indicates the best action in a given state). Network has no hidden layer. Activation function - sigmoid. Learning method - Q learning + backpropagation.
The network is trained, but poorly (continues to step on the occupied cells). So i decided to add a hidden layer. And i would like to ask:
Which number of neurons in the hidden layer to use and what activation functions are better use for hidden and output layers?