RuntimeError: CUDA out of memory. Problem with stanza lemmatazation using too much GPU memory

Question

Good day, I have 11GB of GPU memory and I run into CUDA memory issue with pretrained lemmatazation.

I used this code:

snlp = stanza.Pipeline(lang="en", use_gpu=True) # tried different batch_size/ lemma_batch_size - did not help
nlp = StanzaLanguage(snlp)

def tokenize(text):
     tokens = nlp(text)
     doc_l = [token.lemma_ for token in doc]
     lower_tokens = [t.lower() for t in doc_l]
     alpha_only = [t for t in lower_tokens if t.isalpha()]
     no_stops = [t for t in alpha_only if t not in stopwords]
     #torch.cuda.empty_cache() # Tried this - did not work
     return no_stops

tfidf = TfidfVectorizer(tokenizer=tokenize, min_df=0.1, max_df=0.9)
# Construct the TF-IDF matrix
tfidf_matrix = tfidf.fit_transform(texts)

RuntimeError: CUDA out of memory. Tried to allocate 978.00 MiB (GPU 0; 11.00 GiB total capacity; 6.40 GiB already allocated; 439.75 MiB free; 6.53 GiB reserved in total by PyTorch).

I tried

 [(tokenize(t) for t in test]

It only lasted for 12 texts. They are 200 words on average each. Based on Error message - 'Tried to allocate 978.00 MiB' and this data - SNLP uses 1GiB of GPU memory per step??

This behavior seems strange to me (probably because I dont understand how library works) as model is already pretrained, so it should not get bigger when transforming new texts, right? Why it needs so much GPU memory?
Is there any way to clear memory after each run of lemma_ for each text? (#torch.cuda.empty_cache()-does not work) and batch_size does not work either.

It works on CPU, however allocates all of the available memory (32G of RAM), however. It is much slower on CPU. I need it to make it work on CUDA.

That is not part of Stanza. Also, I'm not sure how much this will help, but you should reduce the pipeline to only running "tokenize,pos,lemma". If you don't specify I think you're running a bunch of other processors as well. — StanfordNLPHelp

Sofie VL Sofie VL · Accepted Answer · 2021-01-22T16:18:54

If you check the full stack trace, there might be a hint which processor runs into the memory issue. For instance, I recently ran into a similar issue with this stack trace:

...
File "stanza/pipeline/depparse_processor.py", line 42, in process     
preds += self.trainer.predict(b)   
File "stanza/models/depparse/trainer.py", line 74, in predict     
_, preds = self.model(word, word_mask, wordchars,
wordchars_mask, upos, xpos, ufeats, pretrained, lemma, head, deprel,
word_orig_idx, sentlens, wordlens)   
... 
RuntimeError: CUDA out of memory.
Tried to allocate 14.87 GiB (GPU 0; 14.76 GiB total capacity; 460.31 MiB already
allocated; 13.39 GiB free; 490.00 MiB reserved in total by PyTorch)

Which pointed me to the fact that I needed to set depparse_batch_size when calling stanza.Pipeline(...). There are other settings like batch_size and lemma_batch_size that you mentioned, as well as pos_batch_size and ner_batch_size etc. These should really help resolve this issue.

RuntimeError: CUDA out of memory. Problem with stanza lemmatazation using too much GPU memory

1 Answers