
How is it possible to lemmatize words like makes to make it make using quanteda.

In Python it is possible to make it using NLTK WordNet Lemmatizer


1 Answers


Stemming can be done with tokens_wordstem or dfm_wordstem. But lemmatizing needs to be done with tokens_replace. Note the diffence between the 2, in lemmatizing "am" is changed into "be" as this is the lemma.

In the lexicon package there is a table called hash_lemmas that you can use as a dictionary. There is no default lemma function in quanteda.

txt <- c("I am going to lemmatize makes into make, but not maker")


# stemming
Tokens consisting of 1 document.
text1 :
 [1] "I"      "am"     "go"     "to"     "lemmat" "make"   "into"   "make"   ","      "but"    "not"    "maker" 

# lemmatizing using lemma table
tokens_replace(tokens(txt), pattern = lexicon::hash_lemmas$token, replacement = lexicon::hash_lemmas$lemma)
Tokens consisting of 1 document.
text1 :
 [1] "I"         "be"        "go"        "to"        "lemmatize" "make"      "into"      "make"      ","         "but"       "not"      
[12] "maker"    

Other lemma options are using spacyr in combination with quanteda. See tutorial with spacyr.

Or you can first use udpipe to get the lemma's and then use quanteda's tokens_replace or dfm_replace functions.