I have a classifier with ~1100 features and 60k samples of training data. I create an RNN with 1100 LSTMcells, and it classifies all my training data correctly, and then underperforms on the test data.
If I had a very large feed-forward NN I think it would behave similarly, and one would reduce size of hidden layer(s), add regularization, dropout, etc. to reduce overfitting.
How would I do the same for the RNN/LSTM? (added dropout but don't see a way to add regularization or especially control the LSTM state size - seems to default to input size which is probably too large)
I see that there was an input_size parameter is now deprecated and unused.
I see references in that doc to
{#LSTMCell.init}
{#LSTMCell.output_size}
{#LSTMCell.state_size}
but how does one use them? the simple tutorial examples just use the defaults, which result in overfitting.
If there is some other way to discover and tune hyperparameters I'm not seeing it.