For convolutional networks, one can view the convolutional part (convolutional, max-pooling etc) as feature extraction which then gets feed into feedforward networks which does the classifying (more or less).
Is the same true for recurrent networks (RNN, LSTM etc), i.e. the recurrent layers creates a represenation of the data/features which then gets feed into a feed-forward layers?
I was think in terms of sentiment analysis, i.e. "sequence to one" model. Do you think that having one recurrent layer + one feed-forward layer would outperform only one recurrent layer network?