I'm currently have a task of converting a keras BERT-based model for any text classification problem to the .pb file. For this I already have a function, that takes in the keras model, but the point is that when I'm trying to download any pre-trained versions of BERT they always end up without any top layers for classification, hence I should manually add tf.keras.layers.Input
layers before and any neural network architecture above the BERT (after [CLS]'s embedding). My goal is ultimately escape the need for fine-tuning and get some ready model, that has already been fine-tuned. I've found that transformer library might be useful for this, as they have some BERT-based models ready for some datasets. Anyway, using the following code from their documentation gives back the tensor of shape 1 by number of tokens by hidden dimensionality.
from transformers import BertTokenizer, TFBertModel
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased')
model = TFBertModel.from_pretrained("bert-large-uncased")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)
So, I eventually have to find some dataset and do fine-tuning. Even usage of models like distilbert-base-uncased-finetuned-sst-2-english still produce the embedding for each input token. Is there a way of getting ready model?