I'm working on my bachelor thesis.
My topic is reinforcement learning. The Setup:
- Unity3D (C#)
- Own neural network framework
Confirmed the network working by testing to training a sine-function. It can approximate it. Well. there are some values which won't get to their desired value but it's good enough. When training it with single Values it always converges.
Here is my problem:
I try to teach my network the Q-Value-Function of a simple game, catch balls: In this game it just has to catch a ball dropping from random position and with random angle. +1 if catch -1 if failed
My network-model has 1 hidden layer with neurons ranging from 45-180 (i tested this numbers with no success)
It uses replay with 32 samples from a 100k memory with a learning-rate of 0.0001 It learns for 50000 frames then tests for 10000 frames. This happens 10 times. Inputs are PlatformPosX, BallPosX, BallPosY from the last 4 frames
Pseudocode:
Choose action (e-greedy)
Do action,
Store state action, CurrentReward. Done in memory
if in learnphase: Replay
My problem is:
Its actions starts clipping to either 0 or 1 with some variance sometimes. It never has a ideal policy like if the platform would just follow the ball.
EDIT: Sorry for cheap info... My Quality-Function is trained by: Reward + Gamma(nextEstimated_Reward) So its discounting.