Lemmatize using quanteda

Question

How is it possible to lemmatize words like makes to make it make using quanteda.

In Python it is possible to make it using NLTK WordNet Lemmatizer

phiver phiver · Accepted Answer · 2020-06-11T17:52:46

Stemming can be done with tokens_wordstem or dfm_wordstem. But lemmatizing needs to be done with tokens_replace. Note the diffence between the 2, in lemmatizing "am" is changed into "be" as this is the lemma.

In the lexicon package there is a table called hash_lemmas that you can use as a dictionary. There is no default lemma function in quanteda.

txt <- c("I am going to lemmatize makes into make, but not maker")

library(quanteda)

# stemming
tokens_wordstem(tokens(txt))
Tokens consisting of 1 document.
text1 :
 [1] "I"      "am"     "go"     "to"     "lemmat" "make"   "into"   "make"   ","      "but"    "not"    "maker" 

# lemmatizing using lemma table
tokens_replace(tokens(txt), pattern = lexicon::hash_lemmas$token, replacement = lexicon::hash_lemmas$lemma)
Tokens consisting of 1 document.
text1 :
 [1] "I"         "be"        "go"        "to"        "lemmatize" "make"      "into"      "make"      ","         "but"       "not"      
[12] "maker"

Other lemma options are using spacyr in combination with quanteda. See tutorial with spacyr.

Or you can first use udpipe to get the lemma's and then use quanteda's tokens_replace or dfm_replace functions.

Lemmatize using quanteda

1 Answers