0
votes

I am trying to summarize text with huggingface T5. I run similar code for BART without issue.

With T5 I receive :

Exception: Impossible to guess which tokenizer to use. Please provide a PreTrainedTokenizer class or a path/identifier to a pretrained tokenizer.

It seems I have already provided the tokenizer : t5-small. I have tried other t5 tokenizers as well but receive the same error. I'm not having this issue with other transformers or tokenizers.

What might be causing this?

from transformers import pipeline
from transformers import T5Tokenizer, TFT5ForConditionalGeneration

pip install sentencepiece

mt5 = TFT5ForConditionalGeneration.from_pretrained('t5-small')
tokenizer = T5Tokenizer.from_pretrained('t5-small')

t5_summarizer = pipeline(
    task="summarization",
    model=mt5,
    tokenizer=tokenizer,
    framework='pt'
)

conversation_summary = []
for row_index, row in text.iterrows():
    conversation = text.iloc[row_index, 1]
    t5_summary = t5_summarizer(conversation, max_length=100)
    print(t5_summary)
    conversation_summary.append(t5_summary)

text["conversation_t5"] = conversation_summary