Bert pre-trained model giving random output each time

Question

I was trying to add an additional layer after huggingface bert transformer, so I used BertForSequenceClassification inside my nn.Module Network. But, I see the model is giving me random outputs when compared to loading the model directly.

Model 1:

from transformers import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels = 5) # as we have 5 classes

import torch
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

input_ids = torch.tensor(tokenizer.encode(texts[0], add_special_tokens=True, max_length = 512)).unsqueeze(0)  # Batch size 1

print(model(input_ids))

Out:

(tensor([[ 0.3610, -0.0193, -0.1881, -0.1375, -0.3208]],
        grad_fn=<AddmmBackward>),)

Model 2:

import torch
from torch import nn

class BertClassifier(nn.Module):
    def __init__(self):
        super(BertClassifier, self).__init__()
        self.bert = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels = 5)
        # as we have 5 classes

        # we want our output as probability so, in the evaluation mode, we'll pass the logits to a softmax layer
        self.softmax = torch.nn.Softmax(dim = 1) # last dimension
    def forward(self, x):
        print(x.shape)
        x = self.bert(x)

        if self.training == False: # in evaluation mode
            pass
            #x = self.softmax(x)

        return x

# create our model

bertclassifier = BertClassifier()

print(bertclassifier(input_ids))

torch.Size([1, 512])
torch.Size([1, 5])
(tensor([[-0.3729, -0.2192,  0.1183,  0.0778, -0.2820]],
        grad_fn=<AddmmBackward>),)

They should be the same model, right. I found a similar issue here but no reasonable explanation https://github.com/huggingface/transformers/issues/2770

Does Bert has some ranomized parameter if so how to get reproducible output?
Why the two models give me different outputs? Is there something I'm doing wrong?

Zabir Al Nazi Zabir Al Nazi · Accepted Answer · 2020-05-09T01:43:28

The reason is due to the random initialization of the classifier layer of Bert. If you print your model, you'll see

    (pooler): BertPooler(
      (dense): Linear(in_features=768, out_features=768, bias=True)
      (activation): Tanh()
    )
  )
  (dropout): Dropout(p=0.1, inplace=False)
  (classifier): Linear(in_features=768, out_features=5, bias=True)
)

There is a classifier in the last layer, this layer is added after bert-base. Now, the expectation is you'll train this layer for your downstream task.

If you want to get more insight:

model, li = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels = 5, output_loading_info=True) # as we have 5 classes
print(li)

{'missing_keys': ['classifier.weight', 'classifier.bias'], 'unexpected_keys': ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias'], 'error_msgs': []}

You can see the classifier.weight and bias are missing, so these part will be randomly initialized each time you call BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels = 5).

Bert pre-trained model giving random output each time

1 Answers