Reinforcement learning for training deep neural network

Question

I am planning to write a chess engine that uses a deep convolutional neural network to evaluate chess positions. I will be using bitboards to represent the board state which means that the input layer should have 12*64 neurons for the position, 1 for the player to move (0 for black, 1 for white) and 4 neurons for castling rights (wks, bks, wqs, bqs). There will be two hidden layers with 515 neurons each, and an output neuron with a value between -1 for black winning, 1 for white winning and 0 for an equal position. All neurons will use the tanh() activation function.

I thought about using supervised learning by feeding the CNN a lot of positions evaluated by Stockfish, but decided not to as this would in some sense just copy another engine's evaluation function.
Therefore i decided to use reinforcement learning, adjusting the weights and biases from self-play games. But how do i train the neural network when there is no way to tell what the correct evaluation of a given position is? How do i "tell" it that a given move was a blunder and that another move was excellent?

I have read through some papers and articles regarding exactly this topic, but none of them seem to explain the adjustment of the neural network when explaining the training process...

All answers are very much appreciated :))

KaiJun KaiJun · Accepted Answer · 2020-08-06T04:05:30

To put it in simple terms, given positions of all the chess pieces, the agent is in a particular state. Each state has a value, where this value can be learned through various methods (neural networks if you are using deep RL). The state value is learned through the expected return (reward) in subsequent states. Therefore, you could train the neural network by simulating the environment and getting rewards from it. The rewards together with state values will be used as the target to train your neural network.

Reinforcement learning for training deep neural network

1 Answers