understanding the description of logits and labels in tensorflow

Question

This is what tensorflow has to say about the logits and labels arguments in tf.nn.sparse_softmax_cross_entropy_with_logits

Args: _sentinel: Used to prevent positional parameters. Internal, do not use.

labels: Tensor of shape [d_0, d_1, ..., d_{r-1}] (where r is rank of labels and result) and dtype int32 or int64. Each entry in labels must be an index in [0, num_classes). Other values will raise an exception when this op is run on CPU, and return NaN for corresponding loss and gradient rows on GPU.

logits: Unscaled log probabilities of shape [d_0, d_1, ..., d_{r-1}, num_classes] and dtype float32 or float64. name: A name for the operation (optional).

I have worked on my fair share of machine learning and deep learning classification problems and the only output shape that i have come across or even think of is [None, 1] or [None, number of classes](if the sparse implementation is not implemented.)

Please shed some light on where do: Tensor (labels) of shape [d_0, d_1, ..., d_{r-1}] and 'Tensor' (logits) Unscaled log probabilities of shape [d_0, d_1, ..., d_{r-1}, num_classes] come from. Maybe just an example would suffice

Olivier Dehaene Olivier Dehaene · Accepted Answer · 2018-07-16T09:43:58

I have worked on my fair share of machine learning and deep learning classification problems and the only output shape that i have come across or even think of is [None, 1] or [None, number of classes]

This is exactly what you are describing.

labels: Tensor of shape [d_0, d_1, ..., d_{r-1}] (where r is rank of labels and result) and dtype int32 or int64. Each entry in labels must be an index in [0, num_classes).

d_0, d_1 ... d_{r-1} are examples of your batch. Here this correspond to a tensor of shape [r, 1] where r is your batch_size.

logits: Unscaled log probabilities of shape [d_0, d_1, ..., d_{r-1}, num_classes]

Same thing here, this correspond to a tensor of shape [r, num_classes].

When we define the model, r = None as we don't want the graph to be batch size dependant.

understanding the description of logits and labels in tensorflow

1 Answers