0
votes

I am using the the TFRobertaForSequenceClassification class of the huggingface library to create a classifier. According to the documentation, the logits output should have a shape of (batch_size, num_labels). I however get (batch_size, seq_length, num_labels) and I dont understand why.

To reproduce this:

from transformers import TFRobertaForSequenceClassification, RobertaConfig
import numpy as np

seq_len = 512

classifier = TFRobertaForSequenceClassification(RobertaConfig())

#create random inputs for demo
input_ids = np.random.randint(0,10000, size=(seq_len,))
attention_mask = np.random.randint(0,2, size=(seq_len,))
token_type_ids = np.random.randint(0,2, size=(seq_len,))

#make a prediction with batch_size of 1
output = classifier.predict([input_ids, attention_mask, token_type_ids])

print(output.logits.shape)

This outputs logits in the shape of (512,2) but I am expecting (1,2) or (batch_size, num_labels). Can anyone shed any light on why it behaves like this?

1

1 Answers

0
votes

Created an issue on github on this and got an answer. The predictions have to be batched (not enough to add a batch size of 1 with a list containing a prediction). Also, 510 is the max length as well since we have to account for the beginning and end tokens - Further discussed here:

https://github.com/huggingface/transformers/issues/9102