I have a connection problem with direct download Bert model(company`s privacy policy) so, I downloaded BertTokenizer at https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_bert.py
and got my model tokenizer`s txt file. "bert-base-multilingual-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt",
but When I import tokenizer model, get an error. My code:
tokenizer = BertTokenizer.from_pretrained("My BERT MODEL DIRECTORY", do_lower_case=False)
tokenized_texts = [tokenizer.tokenize(sent) for sent in sentences]
print (sentences[0])
print (tokenized_texts[0])
Error Message
'utf-8' codec can't decode bytes in position 7526-7527: invalid continuation byte
I trying to + encoding = 'utf-8', 'cp949' like this
tokenizer = BertTokenizer.from_pretrained("My BERT MODEL DIRECTORY", encoding = 'uft-8', do_lower_case=False)
but It doesn`t work.. Thank you for your comment in advance.