How do I tune state size for a Tensorflow RNN / LSTM, or more generally reduce overfitting?

Question

I have a classifier with ~1100 features and 60k samples of training data. I create an RNN with 1100 LSTMcells, and it classifies all my training data correctly, and then underperforms on the test data.

If I had a very large feed-forward NN I think it would behave similarly, and one would reduce size of hidden layer(s), add regularization, dropout, etc. to reduce overfitting.

How would I do the same for the RNN/LSTM? (added dropout but don't see a way to add regularization or especially control the LSTM state size - seems to default to input size which is probably too large)

I see that there was an input_size parameter is now deprecated and unused.

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard5/tf.nn.rnn_cell.LSTMCell.md

I see references in that doc to

{#LSTMCell.init}
{#LSTMCell.output_size}
{#LSTMCell.state_size}

but how does one use them? the simple tutorial examples just use the defaults, which result in overfitting.

If there is some other way to discover and tune hyperparameters I'm not seeing it.

I guess my thinking is to do dimensionality reduction on the data. Nevertheless I'm surprised by the absence of tunable regularization parameters and state size and I feel like I must be missing something. I do have a lot of predictors for the sample size, but logistic regression and a feed-forward NN perform pretty well on the input. — Rocky McNuts

Jules G.M. Jules G.M. · Accepted Answer · 2016-08-31T20:13:12

Batch normalisation is now well accepted as a general learning facilitator and régularizer.

Here is a Tensorflow implementation of a batch normalized LSTM Cell: https://github.com/OlavHN/bnlstm/blob/master/lstm.py

This implementation Explained in the article here : Batch normalized LSTM for Tensorflow

It is applying the principles from the paper: Recurrent Batch Normalization (arXiv)

How do I tune state size for a Tensorflow RNN / LSTM, or more generally reduce overfitting?

1 Answers