I am using the the TFRobertaForSequenceClassification class of the huggingface library to create a classifier. According to the documentation, the logits output should have a shape of (batch_size, num_labels). I however get (batch_size, seq_length, num_labels) and I dont understand why.
To reproduce this:
from transformers import TFRobertaForSequenceClassification, RobertaConfig
import numpy as np
seq_len = 512
classifier = TFRobertaForSequenceClassification(RobertaConfig())
#create random inputs for demo
input_ids = np.random.randint(0,10000, size=(seq_len,))
attention_mask = np.random.randint(0,2, size=(seq_len,))
token_type_ids = np.random.randint(0,2, size=(seq_len,))
#make a prediction with batch_size of 1
output = classifier.predict([input_ids, attention_mask, token_type_ids])
print(output.logits.shape)
This outputs logits in the shape of (512,2) but I am expecting (1,2) or (batch_size, num_labels). Can anyone shed any light on why it behaves like this?