1
votes

I am trying to use Huggingface transformer api to load a locally downloaded M-BERT model but it is throwing an exception. I clone this repo: https://huggingface.co/bert-base-multilingual-cased

bert = TFBertModel.from_pretrained("input/bert-base-multilingual-cased")

The directory structure is:

Directory structure

But I am getting this error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 1277, in from_pretrained
    missing_keys, unexpected_keys = load_tf_weights(model, resolved_archive_file, load_weight_prefix)
  File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 467, in load_tf_weights
    with h5py.File(resolved_archive_file, "r") as f:
  File "/usr/local/lib/python3.7/dist-packages/h5py/_hl/files.py", line 408, in __init__
    swmr=swmr)
  File "/usr/local/lib/python3.7/dist-packages/h5py/_hl/files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (file signature not found)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 81, in <module>
    __main__()
  File "train.py", line 59, in __main__
    model = create_model(num_classes)
  File "/content/drive/My Drive/msc-project/code/model.py", line 26, in create_model
    bert = TFBertModel.from_pretrained("input/bert-base-multilingual-cased")
  File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 1280, in from_pretrained
    "Unable to load weights from h5 file. "
OSError: Unable to load weights from h5 file. If you tried to load a TF 2.0 model from a PyTorch checkpoint, please set from_pt=True. 

Where am I going wrong? Need help! Thanks in advance.

1
Try local_files_only=True - ML_Engine
Also check that the place you're running the script from is in the same level as /input - ML_Engine

1 Answers

1
votes

As it was already pointed in the comments - your from_pretrained param should be either id of a model hosted on huggingface.co or a local path:

A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.

See documentation

Looking at your stacktrace it seems like your code is run inside:

/content/drive/My Drive/msc-project/code/model.py so unless your model is in: /content/drive/My Drive/msc-project/code/input/bert-base-multilingual-cased/ it won't load.

I would also set the path to be similar to documentation example ie:

bert = TFBertModel.from_pretrained("./input/bert-base-multilingual-cased/")