I understand the need for Recurrent Neural Networks (RNN) to have a memory, and how this obtained by feeding the output of the hidden neurons back. But why can't they just preserve the inputs; that is, delay and feedback the series of inputs only, not the hidden neurons that are functions of those inputs, and use that as the context?
That would seem to solve a lot of the problems with feeding the entire hidden state back (i.e. BPTT difficulties) and yet still preserve all the context. By definition, the inputs have whatever data you need to calculate the context.
Even if the context is a function of the inputs and not the inputs themselves, we can still use this, as some neurons in the single hidden layer will be able to be functions of the x(t-1) x(t-2)...
inputs. So, we can still compute anything we can compute with a standard RNN, but with a lot less complexity: some neurons will specialize on x(t)
and some on x(t-n)
.
Now, since no one is doing this, I have to imagine they considered it and rejected it. Why?