Parameters in tf.contrib.seq2seq.sequence_loss

Question

I'm trying to use the tf.contrib.seq2seq.sequence_loss function in a RNN model to calculate the loss. According to the API document, this function requires at least three parameters: logits, targets and weights

sequence_loss(
    logits,
    targets,
    weights,
    average_across_timesteps=True,
    average_across_batch=True,
    softmax_loss_function=None,
    name=None
)

logits: A Tensor of shape [batch_size, sequence_length, num_decoder_symbols] and dtype float. The logits correspond to the prediction across all classes at each timestep.
targets: A Tensor of shape [batch_size, sequence_length] and dtype int. The target represents the true class at each timestep. 
weights: A Tensor of shape [batch_size, sequence_length] and dtype float. weights constitutes the weighting of each prediction in the sequence. When using weights as masking, set all valid timesteps to 1 and all padded timesteps to 0, e.g. a mask returned by tf.sequence_mask.
average_across_timesteps: If set, sum the cost across the sequence dimension and divide the cost by the total label weight across timesteps.
average_across_batch: If set, sum the cost across the batch dimension and divide the returned cost by the batch size.
softmax_loss_function: Function (labels, logits) -> loss-batch to be used instead of the standard softmax (the default if this is None). Note that to avoid confusion, it is required for the function to accept named arguments.
name: Optional name for this operation, defaults to "sequence_loss".

My understand is logits is my prediction after using Xw+b, so the shape of it should be [batch_size, sequence_length, output size]. Then target should be my label, but the shape required in is [batch_size, sequence_length]. I suppose my label should have the same shape as the logits.

So how to convert the 3d labels to 2d? Thanks in advance

jimmy liu jimmy liu · Accepted Answer · 2018-06-06T12:09:40

Your targets(labels) don't need to be the same shape with logits.
If we ignore batch_size(which is not relevant to your question) for a moment, this API simply calculates loss between two sequences through weighed sum loss of each word.Suppose vocab_size is 5, and we get a target word 3, logits provide a prediction for this target with a vector [0.2, 0.1, 0.15, 0.4, 0.15].
To calculate the loss between target and prediction, target need not to be the same shape with prediction as [0, 0, 0, 1, 0]. tensorflow will do this internally.
You may refer to the distinction between two api: softmax_cross_entropy_with_logits and sparse_softmax_cross_entropy_with_logits

Parameters in tf.contrib.seq2seq.sequence_loss

3 Answers