Confusion in understanding the output of BERTforTokenClassification class from Transformers library

Question

It is the example given in the documentation of transformers pytorch library

from transformers import BertTokenizer, BertForTokenClassification
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('bert-base-uncased', 
                      output_hidden_states=True, output_attentions=True)

input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", 
                         add_special_tokens=True)).unsqueeze(0)  # Batch size 1
labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0)  # Batch size 1
outputs = model(input_ids, labels=labels)

loss, scores, hidden_states,attentions = outputs

Here hidden_states is a tuple of length 13 and contains hidden-states of the model at the output of each layer plus the initial embedding outputs. I would like to know, whether hidden_states[0] or hidden_states[12] represent the final hidden state vectors?

Can you please specify which version of huggingface's transformers you are using? For version 2.6 I'm only getting two outputs for your sample code. — dennlinger
I forget to specify output_hidden_states=True and output_attentions=True while loading the model. I'm sorry. When you include these two arguments,it will return four outputs. @dennlinger — Mr. NLP

dennlinger dennlinger · Accepted Answer · 2020-03-25T11:54:21

If you check the source code, specifically BertEncoder, you can see that the returned states are initialized as an empty tuple and then simply appended per iteration of each layer.

The final layer is appended as the last element after this loop, see here, so we can safely assume that hidden_states[12] is the final vectors.

Confusion in understanding the output of BERTforTokenClassification class from Transformers library

1 Answers