It is the example given in the documentation of transformers pytorch library
from transformers import BertTokenizer, BertForTokenClassification
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('bert-base-uncased',
output_hidden_states=True, output_attentions=True)
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute",
add_special_tokens=True)).unsqueeze(0) # Batch size 1
labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0) # Batch size 1
outputs = model(input_ids, labels=labels)
loss, scores, hidden_states,attentions = outputs
Here hidden_states
is a tuple of length 13 and contains hidden-states of the model at the output of each layer plus the initial embedding outputs. I would like to know, whether hidden_states[0] or hidden_states[12] represent the final hidden state vectors?
output_hidden_states=True
andoutput_attentions=True
while loading the model. I'm sorry. When you include these two arguments,it will return four outputs. @dennlinger – Mr. NLP