I want to learn how deep reinforcement algorithm works and what time it takes to train itself for any given environment. I came up with a very simple example of environment:
There is a counter which holds an integer between 0 to 100. counting to 100 is its goal.
there is one parameter direction whose value can be +1 or -1.
it simply show the direction to move.
out neural network takes this direction as input and 2 possible action as output.
- Change the direction
- Do not change the direction
1st action will simply flip the direction (+1 => -1 or -1 =>+1). 2nd action will keep the direction as it is.
I am using python for backend and javascript for frontend. It seems to take too much time, and still it is pretty random. i have used 4 layer perceptron. training rate of 0.001 . memory learning with batch of 100. Code is of Udemy tutorial of Artificial Intelligence and is working properly.
My question is, What should be the reward for completion and for each state.? and how much time it is required to train simple example as that.?