I am new to R and used the quanteda package in R to create a corpus of newspaper articles. From this I have created a dfm
:
dfmatrix <- dfm(corpus, remove = stopwords("english"),stem = TRUE, remove_punct=TRUE, remove_numbers = FALSE)
I am trying to extract bigrams (e.g. "climate change", "global warming") but keep getting an error message when I type the following, saying the ngrams argument is not used.
dfmatrix <- dfm(corpus, remove = stopwords("english"),stem = TRUE, remove_punct=TRUE, remove_numbers = FALSE, ngrams = 2)
I have installed the tokenizer, tidyverse, dplyr, ngram, readtext, quanteda and stm libraries. Below is a screenshot of my corpus. Doc_iD is the article titles. I need the bigrams to be extracted from the "texts" column.
Do I need to extract the ngrams from the corpus first or can I do it from the dfm? Am I missing some piece of code that allows me to extract the bigrams?