quanteda dfm() Error: groups must have length ndoc(x)

Question

I'm trying to run a keyness analysis, everything worked and then, for an unknown reason, it started to give me an error. I'm using data_corpus_inaugural which is the quanteda-package corpus object of US presidents' inaugural addresses.

My code:

> corpus_pres <- corpus_subset(data_corpus_inaugural, 
+                             President %in% c("Obama", "Trump"))
> dtm_pres <- dfm(corpus_pres, groups = "President", 
+                remove = stopwords("english"), remove_punct = TRUE)
Error: groups must have length ndoc(x)
In addition: Warning messages:
1: 'dfm.corpus()' is deprecated. Use 'tokens()' first. 
2: '...' should not be used for tokens() arguments; use 'tokens()' first. 
3: 'groups' is deprecated; use dfm_group() instead 
>

It is possible that it is some kind of quanteda issue? even though quanteda is loaded, it cannot find textstat_keyness > keyness = textstat_keyness(dtm_pres, target = "Trump") Error in textstat_keyness(dtm_pres, target = "Trump") : could not find function "textstat_keyness" — Maayan Klimenko Feinstein
See github.com/quanteda/quanteda/blob/master/…, Should be groups = President in quanteda v3. — Ken Benoit

user16299130 user16299130 · Accepted Answer · 2021-06-23T14:33:46

In quanteda v3 "dfm() constructs a document-feature matrix from a tokens object" - https://tutorials.quanteda.io/basic-operations/dfm/dfm/

Try this:

toks_pres <- tokens(pres_corpus, remove_punct = TRUE) %>% 
    tokens_remove(pattern = stopwords("en")) %>%
    tokens_group(groups = President)

pres_dfm <- dfm(toks_pres)

quanteda dfm() Error: groups must have length ndoc(x)

2 Answers