I'm using this solution(get what percent of documents contain a feature - quanteda) to find the number of documents that contain any one of a group of features in my dataset. As long as the document contains any one of the words, I want it to return TRUE.
I got it to work, but it only works some of the time and I can't figure out why. Removing or adding words works sometimes and not at other times. This is the code I used (the compound phrases have already been "tokens_compound" in the dfm)
thetarget <- c("testing", "test", "example words", "example")
df <- data.frame(docname = docnames(dfm),
Year = docvars(dfm, c("Year")),
contains_target = rowSums(dfm[, thetarget]) > 0,
row.names = NULL)
And the error I get sometimes
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'rowSums':
Subscript out of bounds
TIA
edit (script to create table showing a year and number of documents containing any of the target words):
df2 <- df %>%
mutate_if(is.logical, as.character) %>%
filter(!str_detect(contains_target, "FALSE")) %>%
group_by(Year) %>%
summarise(n = n())
dfm[, thetarget]is not defined; I don't understand the packages you're using, but does that help? - walter