
I use the following command to make stemming using quanteda

myDfm <- dfm(tokens_remove(tokens(df2, remove_punct = TRUE, stem = TRUE, remove_numbers = TRUE, remove_symbols = TRUE), stopwords(source = "smart")), 
                          ngrams = c(1,2))

However I receive this warning at the end:

Warning message:
Argument stem not used. 

Is there any different option to implement stemming with quanteda?

Please be more pedagogical and include some context, e.g. explain what quanteda is, and share your data or a subset of it, i.e. of data frame df2.plant
It's just a warning.IRTFM

1 Answers


Yes, you want tokens_wordstem(). In your example, you are are supplying stem = TRUE to the tokens() argument, not to the dfm() call. tokens() does not have stem as an argument (as the warning states).

For clarity I suggest using the pipe operator %>% to see the sequence of operations more clearly.

## Package version: 1.4.0
## Parallel computing: 2 of 12 threads used.
## See https://quanteda.io for tutorials and examples.
## Attaching package: 'quanteda'
## The following object is masked from 'package:utils':
##     View

df2 <- data_char_sampletext
quanteda_options(verbose = TRUE)

df2 %>%
  tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
  tokens_remove(stopwords(source = "smart")) %>%
  tokens_wordstem() %>%
  tokens_ngrams(n = c(1, 2)) %>%
## removed 0 features
## removed 72 features
## Creating a dfm from a tokens input...
##    ... lowercasing
##    ... found 1 document, 375 features
##    ... created a 1 x 375 sparse dfm
##    ... complete. 
## Elapsed time: 0.038 seconds.
## Document-feature matrix of: 1 document, 375 features (0.0% sparse).