Error while trying to fine-tune the ReformerModelWithLMHead (google/reformer-enwik8) for NER

Question

I'm trying to fine-tune the ReformerModelWithLMHead (google/reformer-enwik8) for NER. I used the padding sequence length same as in the encode method (max_length = max([len(string) for string in list_of_strings])) along with attention_masks. And I got this error:

ValueError: If training, make sure that config.axial_pos_shape factors: (128, 512) multiply to sequence length. Got prod((128, 512)) != sequence_length: 2248. You might want to consider padding your sequence length to 65536 or changing config.axial_pos_shape.

When I changed the sequence length to 65536, my colab session crashed by getting all the inputs of 65536 lengths.
According to the second option(changing config.axial_pos_shape), I cannot change it.

I would like to know, Is there any chance to change config.axial_pos_shape while fine-tuning the model? Or I'm missing something in encoding the input strings for reformer-enwik8?

Thanks!

Question Update: I have tried the following methods:

By giving paramteres at the time of model instantiation:

model = transformers.ReformerModelWithLMHead.from_pretrained("google/reformer-enwik8", num_labels=9, max_position_embeddings=1024, axial_pos_shape=[16,64], axial_pos_embds_dim=[32,96],hidden_size=128)

It gives me the following error:

RuntimeError: Error(s) in loading state_dict for ReformerModelWithLMHead: size mismatch for reformer.embeddings.word_embeddings.weight: copying a param with shape torch.Size([258, 1024]) from checkpoint, the shape in current model is torch.Size([258, 128]). size mismatch for reformer.embeddings.position_embeddings.weights.0: copying a param with shape torch.Size([128, 1, 256]) from checkpoint, the shape in current model is torch.Size([16, 1, 32]).

This is quite a long error.

Then I tried this code to update the config:

model1 = transformers.ReformerModelWithLMHead.from_pretrained('google/reformer-enwik8', num_labels = 9)

Reshape Axial Position Embeddings layer to match desired max seq length

model1.reformer.embeddings.position_embeddings.weights[1] = torch.nn.Parameter(model1.reformer.embeddings.position_embeddings.weights[1][0][:128])

Update the config file to match custom max seq length

model1.config.axial_pos_shape = 16,128
model1.config.max_position_embeddings = 16*128 #2048
model1.config.axial_pos_embds_dim= 32,96
model1.config.hidden_size = 128
output_model_path = "model"
model1.save_pretrained(output_model_path)

By this implementation, I am getting this error:

RuntimeError: The expanded size of the tensor (512) must match the existing size (128) at non-singleton dimension 2. Target sizes: [1, 128, 512, 768]. Tensor sizes: [128, 768]

Because updated size/shape doesn't match with the original config parameters of pretrained model. The original parameters are: axial_pos_shape = 128,512 max_position_embeddings = 128*512 #65536 axial_pos_embds_dim= 256,768 hidden_size = 1024

Is it the right way I'm changing the config parameters or do I have to do something else?

Is there any example where ReformerModelWithLMHead('google/reformer-enwik8') model fine-tuned.

My main code implementation is as follow:

class REFORMER(torch.nn.Module):
def __init__(self):
    super(REFORMER, self).__init__()
    self.l1 = transformers.ReformerModelWithLMHead.from_pretrained("google/reformer-enwik8", num_labels=9)

def forward(self, input_ids, attention_masks, labels):
    output_1= self.l1(input_ids, attention_masks, labels = labels)
    return output_1


model = REFORMER()

def train(epoch):
    model.train()
    for _, data in enumerate(training_loader,0):
        ids = data['input_ids'][0]   # input_ids from encode method of the model https://huggingface.co/google/reformer-enwik8#:~:text=import%20torch%0A%0A%23%20Encoding-,def%20encode,-(list_of_strings%2C%20pad_token_id%3D0
        input_shape = ids.size()
        targets = data['tags']
        print("tags: ", targets, targets.size())
        least_common_mult_chunk_length = 65536 
        padding_length = least_common_mult_chunk_length - input_shape[-1] % least_common_mult_chunk_length
        #pad input 
        input_ids, inputs_embeds, attention_mask, position_ids, input_shape = _pad_to_mult_of_chunk_length(self=model.l1,
                input_ids=ids,
                inputs_embeds=None,
                attention_mask=None,
                position_ids=None,
                input_shape=input_shape,
                padding_length=padding_length,
                padded_seq_length=None,
                device=None,
            )
        outputs = model(input_ids, attention_mask, labels=targets) # sending inputs to the forward method
        print(outputs)
        loss = outputs.loss
        logits = outputs.logits
        if _%500==0:
           print(f'Epoch: {epoch}, Loss:  {loss}')

for epoch in range(1):
    train(epoch)

Can you post the code snippet for the minimal reproducible example? — kkgarg
@kkgarg, I have update my question with code snippet. Please review — Lovish Dogra

meti meti · Accepted Answer · 2021-08-15T06:11:34

The Reformer model was proposed in the paper Reformer: The Efficient Transformer by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya. The paper contains a method for factorization gigantic matrix which is resulted of working with very long sequences! This factorization is relying on 2 assumptions

the parameter config.axial_pos_embds_dim is set to a tuple (d1,d2) which sum has to be equal to config.hidden_size
config.axial_pos_shape is set to a tuple (n1s,n2s) which product has to be equal to config.max_embedding_size (more on these here!)

Finally your question ;)

I'm almost sure your session crushed duo to ram overflow
you can change any config parameter during model instantiation like the official documentation!

Error while trying to fine-tune the ReformerModelWithLMHead (google/reformer-enwik8) for NER

Reshape Axial Position Embeddings layer to match desired max seq length

Update the config file to match custom max seq length

2 Answers