I'm trying to understand what % of documents contain a feature using quanteda. I know the dfm_weight() is available, but I believe the 'prop' feature looks at feature frequency within a document and not across documents.
My goal would be to avoid having to do the ifelse statement and keep it all in quanteda, but I'm not sure this is possible. The output I'm looking for is a side-by-side bar chart grouped by year that has features along the y-axis and % occurrence in documents along the x. The interpretation here would then be "In 20% of all comments in 2018, people mention the word X, compared to 24% in 2019."
library(quanteda)
library(reshape2)
library(dplyr)
df$rownum = 1:nrow(df) # unique ID
dfCorp19 = df %>%
corpus(df, text_field = 'WhatPromptedYourSearch', docid_field = 'rownum')
x = dfm(dfCorp19,
remove=c(stopwords(), toRemove),
remove_numbers = TRUE,
remove_punct = TRUE) %>%
textstat_frequency(groups ='year')
x = x %>% group_by(group) %>% mutate(prop = ifelse(group=='2019', docfreq/802, docfreq/930))
x = dcast(x,feature ~ group, value.var='prop')

dfcoming from? could you include a link (maybe Dropbox/GoogleDocs)? - Nate