Good day, I have 11GB of GPU memory and I run into CUDA memory issue with pretrained lemmatazation.
I used this code:
snlp = stanza.Pipeline(lang="en", use_gpu=True) # tried different batch_size/ lemma_batch_size - did not help
nlp = StanzaLanguage(snlp)
def tokenize(text):
tokens = nlp(text)
doc_l = [token.lemma_ for token in doc]
lower_tokens = [t.lower() for t in doc_l]
alpha_only = [t for t in lower_tokens if t.isalpha()]
no_stops = [t for t in alpha_only if t not in stopwords]
#torch.cuda.empty_cache() # Tried this - did not work
return no_stops
tfidf = TfidfVectorizer(tokenizer=tokenize, min_df=0.1, max_df=0.9)
# Construct the TF-IDF matrix
tfidf_matrix = tfidf.fit_transform(texts)
RuntimeError: CUDA out of memory. Tried to allocate 978.00 MiB (GPU 0; 11.00 GiB total capacity; 6.40 GiB already allocated; 439.75 MiB free; 6.53 GiB reserved in total by PyTorch).
I tried
[(tokenize(t) for t in test]
It only lasted for 12 texts. They are 200 words on average each. Based on Error message - 'Tried to allocate 978.00 MiB' and this data - SNLP uses 1GiB of GPU memory per step??
- This behavior seems strange to me (probably because I dont understand how library works) as model is already pretrained, so it should not get bigger when transforming new texts, right? Why it needs so much GPU memory?
- Is there any way to clear memory after each run of lemma_ for each text? (#torch.cuda.empty_cache()-does not work) and batch_size does not work either.
It works on CPU, however allocates all of the available memory (32G of RAM), however. It is much slower on CPU. I need it to make it work on CUDA.