2
votes

I understand the need for Recurrent Neural Networks (RNN) to have a memory, and how this obtained by feeding the output of the hidden neurons back. But why can't they just preserve the inputs; that is, delay and feedback the series of inputs only, not the hidden neurons that are functions of those inputs, and use that as the context?

That would seem to solve a lot of the problems with feeding the entire hidden state back (i.e. BPTT difficulties) and yet still preserve all the context. By definition, the inputs have whatever data you need to calculate the context.

Even if the context is a function of the inputs and not the inputs themselves, we can still use this, as some neurons in the single hidden layer will be able to be functions of the x(t-1) x(t-2)... inputs. So, we can still compute anything we can compute with a standard RNN, but with a lot less complexity: some neurons will specialize on x(t) and some on x(t-n).

Now, since no one is doing this, I have to imagine they considered it and rejected it. Why?

1

1 Answers

0
votes

Look at http://www-dsi.ing.unifi.it/~paolo/ps/tnn-94-gradient.pdf Learning Long-Term Dependencies with Gradient Descent is Difficult by Bengio at al. "Recurrent neural networks... have an internal state that can... keep information about past inputs for an amount of time that is not fixed a priori... In contrast, static networks (i.e., with no recurrent connection), even if they include delays (such as time delay neural networks)have a finite impulse response and can't store a bit of information for an indefinite time.

So, it seems that the scheme I propose is called a Time Delay Neural Network by Bengio, and its major drawback is that there is a fixed, finite, maximum memory. For example, there's no way to implement an acculumator (past the window size) in one. Contrast that with a true RNN, where it is possible (though perhaps hard to learn) for the weights to retain particular information indefinitely. An accumulator, for example, is easy to implement.