I am currently working on the lemmantization of a word from a csv file, where afterwards I passed all words in lowercase letters, removed all punctuation and split the column.
I use only two CSV columns: analyze.info():
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4637 entries, 0 to 4636. Data columns (total 2 columns):
# Column Non-Null Count Dtype
0 Comments 4637 non-null object
1 Classification 4637 non-null object
import string
import pandas as pd
from nltk.corpus import stopwords
from nltk.stem import
analyze = pd.read_csv('C:/Users/(..)/Talk London/ALL_dataset.csv', delimiter=';', low_memory=False, encoding='cp1252', usecols=['Comments', 'Classification'])
lower_case = analyze['Comments'].str.lower()
cleaned_text = lower_case.str.translate(str.maketrans('', '', string.punctuation))
tokenized_words = cleaned_text.str.split()
final_words = []
for word in tokenized_words:
if word not in stopwords.words('english'):
final_words.append(word)
wnl = WordNetLemmatizer()
lemma_words = []
lem = ' '.join([wnl.lemmatize(word) for word in tokenized_words])
lemma_words.append(lem)
When I run the code return this error:
Traceback (most recent call last):
File "C:/Users/suiso/PycharmProjects/SA_working/SA_Main.py", line 52, in lem = ' '.join([wnl.lemmatize(word) for word in tokenized_words])
File "C:/Users/suiso/PycharmProjects/SA_working/SA_Main.py", line 52, in lem = ' '.join([wnl.lemmatize(word) for word in tokenized_words])
File "C:\Users\suiso\PycharmProjects\SA_working\venv\lib\site-packages\nltk\stem\wordnet.py", line 38, in lemmatize lemmas = wordnet._morphy(word, pos)
File "C:\Users\suiso\PycharmProjects\SA_working\venv\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1897, in _morphy
if form in exceptions:
TypeError: unhashable type: 'list'
wordwhen you callwnl.lemmatize(word)?. doprint(type(word))- baldermanprint(type(word))return: <class 'list'> - Sérgio Meireleslisttolemmatizeand it try to add it to adictorset. This is the core of the issue you are facing. Read the docs ofWordNetLemmatizerand see which values you can use. see machinelearningplus.com/nlp/lemmatization-examples-python/… - balderman