0
votes

I've downloaded bert pretrained model 'bert-base-cased. I'm unable to load the model with help of BertTokenizer. I'm trying for bert tokenizer. In the bert-pretrained-model folder I have config.json and pytorch_model.bin.

tokenizer = BertTokenizer.from_pretrained(r'C:\Downloads\bert-pretrained-model')

I'm facing error like

OSError                                   Traceback (most recent call last)
<ipython-input-17-bd4c0051c48e> in <module>
----> 1 tokenizer = BertTokenizer.from_pretrained(r'\Downloads\bert-pretrained-model')

~\sentiment_analysis\lib\site-packages\transformers\tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   1775                 f"- or '{pretrained_model_name_or_path}' is the correct path to a directory containing relevant tokenizer files\n\n"
   1776             )
-> 1777             raise EnvironmentError(msg)
   1778 
   1779         for file_id, file_path in vocab_files.items():

OSError: Can't load tokenizer for 'C:\Downloads\bert-pretrained-model'. Make sure that:

- 'C:\Downloads\bert-pretrained-model' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'C:\Downloads\bert-pretrained-model' is the correct path to a directory containing relevant tokenizer files

When I'm trying load with BertModel, it's loading. But when i'm trying with BertTokenizer it's not loading.

1
You are missing vocab.json.cronoik
I have config.json, where can I get vocab.json? I'm trying sentiment analysis with Hugging Face, Torch and BertNithin Reddy
Should I download only vocab.txt and place it in model folder or should I download any extra files?Nithin Reddy
You also need the tokenizer_config.json.cronoik

1 Answers

0
votes

Whats' the version of transformers you are using?. I had a similar issue, the solution was to upgrade the transformers to the latest(like 4.3.3 currently) version (I was using an old 2..1 version because I had to make an older code run) and it worked. Looks like older versions of transformers have this issue with loading the language model from the local path.

I would suggest upgrading your transformers in a separate virtual environment, this way you won't mess up other codes. If you don't use a virtual environment, it's highly recommended that you do now, here is a good and simple way of installing and creating one (includes Windows, considering your case) in this link.

Alternative recommendation: This may not be an answer to your question, but I would suggest using the pre-trained language model directly, instead of downloading it and pointing to its local path. At least that's a recommended way by huggingface. The only downside of this is when you don't have fast internet, it might take a while to load it. Other than that, including this line of code instead of yours would easily solve your peoblem:

tokenizer = BertTokenizer.from_pretrained("bert-base-cased")