1
votes

I'm trying to create a text generation neural network using a LSTM cell and Tensorflow. I'm training the network on sentences in time-major format [time_steps, batch_size, input_size] and I want each time step to predict the next word in a sequence. The sequence is padded with empty values up to the time steps and a separate placeholder contains the length of each sequence in the batch.

There is a lot of information on the concept of back propagation through time, however I can't find anything regarding the actual implementation in tensorflow for variable length sequence cost calculation. Since the end of the sequence is padded, I'm assuming I don't want to calculate the cost on the padded parts. So I need a way to clip the outputs from the first output to the end of the sequence.

Here's the code I currently have:

    outputs = []
    states = []
    cost = 0
    for i in range(time_steps+1):
        output, state = cell(X[i], state)
        z1 = tf.matmul(output, dec_W1) + dec_b1
        a1 = tf.nn.sigmoid(z1)
        z2 = tf.matmul(a1, dec_W2) + dec_b2
        a2 = tf.nn.softmax(z2)
        outputs.append(a2)
        states.append(state)
        #== calculate cost
        cost = cost + tf.nn.softmax_cross_entropy_with_logits(logits=z2, labels=y[i])
    optimizer = tf.train.AdamOptimizer(0.001).minimize(cost)

This code works without variable length sequences. However if I have padded values added to the end, then it calculates the cost of the padded sections as well which doesn't make much sense.

How can I only calculate the cost of the outputs before the sequence length cap?

1

1 Answers

1
votes

Worked it out!

After digging through a lot of examples (most were in higher level frameworks such as Keras which is a pain) I discovered that you have to create a mask! Seems simple in retrospect.

Here's the code to create a mask of 1's and 0's and then element-wise multiply it against the matrix (which would be the cost values)

x = tf.placeholder(tf.float32)
seq = tf.placeholder(tf.int32)

def mask_by_length(input_matrix, length):
    '''
        Input matrix is a 2d tensor [batch_size, time_steps]
        length is a 1d tensor
        length refers to the length of input matrix axis 1
    '''
    length_transposed = tf.expand_dims(length, 1)

    # Create range in order to compare length to
    range = tf.range(tf.shape(input_matrix)[1])
    range_row = tf.expand_dims(range, 0)

    # Use the logical operations to create a mask
    mask = tf.less(range_row, length_transposed)

    # cast boolean to int to finalize mask
    mask_result = tf.cast(mask, dtype=tf.float32)

    # Element-wise multiplication to cancel out values in the mask
    result = tf.multiply(mask_result, input_matrix)

    return result

mask_values = mask_by_length(x, seq)

input vals (time-major) [time_steps, batch_size]

[[ 0.71, 0.22, 1.42, -0.28, 0.99] [ 0.41, 2.24, 0.09, 0.74, 0.65]]

sequence vals [batch_size]

[2, 3]

output (time-major) [time_steps, batch_size]

[[ 0.71, 0.22, 0, 0, 0, ] [ 0.41, 2.24, 0.09, 0, 0, ]]