I'm trying to create a text generation neural network using a LSTM cell and Tensorflow. I'm training the network on sentences in time-major format [time_steps, batch_size, input_size] and I want each time step to predict the next word in a sequence. The sequence is padded with empty values up to the time steps and a separate placeholder contains the length of each sequence in the batch.
There is a lot of information on the concept of back propagation through time, however I can't find anything regarding the actual implementation in tensorflow for variable length sequence cost calculation. Since the end of the sequence is padded, I'm assuming I don't want to calculate the cost on the padded parts. So I need a way to clip the outputs from the first output to the end of the sequence.
Here's the code I currently have:
outputs = []
states = []
cost = 0
for i in range(time_steps+1):
output, state = cell(X[i], state)
z1 = tf.matmul(output, dec_W1) + dec_b1
a1 = tf.nn.sigmoid(z1)
z2 = tf.matmul(a1, dec_W2) + dec_b2
a2 = tf.nn.softmax(z2)
outputs.append(a2)
states.append(state)
#== calculate cost
cost = cost + tf.nn.softmax_cross_entropy_with_logits(logits=z2, labels=y[i])
optimizer = tf.train.AdamOptimizer(0.001).minimize(cost)
This code works without variable length sequences. However if I have padded values added to the end, then it calculates the cost of the padded sections as well which doesn't make much sense.
How can I only calculate the cost of the outputs before the sequence length cap?