HuggingFace - GPT2 Tokenizer configuration in config.json

Question

The GPT2 finetuned model is uploaded in huggingface-models for the inferencing

Below error is observed during the inference,

Can't load tokenizer using from_pretrained, please update its configuration: Can't load tokenizer for 'bala1802/model_1_test'. Make sure that: - 'bala1802/model_1_test' is a correct model identifier listed on 'https://huggingface.co/models' - or 'bala1802/model_1_test' is the correct path to a directory containing relevant tokenizer files

Below is the configuration - config.json file for the Finetuned huggingface model,

{
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "gradient_checkpointing": false,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.3.2",
  "use_cache": true,
  "vocab_size": 50257
}

Should I configure the GPT2 Tokenizer just like the "model_type": "gpt2" in the config.json file

cronoik cronoik · Accepted Answer · 2021-02-19T13:25:37

Your repository does not contain the required files to create a tokenizer. It seems like you have only uploaded the files for your model. Create an object of your tokenizer that you have used for training the model and save the required files with save_pretrained():

from transformers import GPT2Tokenizer

t = GPT2Tokenizer.from_pretrained("gpt2")
t.save_pretrained('/SOMEFOLDER/')

Output:

('/SOMEFOLDER/tokenizer_config.json',
 '/SOMEFOLDER/special_tokens_map.json',
 '/SOMEFOLDER/vocab.json',
 '/SOMEFOLDER/merges.txt',
 '/SOMEFOLDER/added_tokens.json')

HuggingFace - GPT2 Tokenizer configuration in config.json

1 Answers