Not yet, directly, but... ANEW differs from other dictionaries since it does not use a key: value pair format, but rather assigns a numerical score to each term. This means you are not counting matches of values against a key, but rather selecting features and then scoring them using weighted counts.
This could be done in quanteda by:
Get ANEW features into a character vector.
Use dfm(yourtext, select = ANEWfeatures)
to create a dfm with just the ANEW features.
Multiple each counted value by the valence of each ANEW value, recycled column-wise so that each feature count gets multiplied by its ANEW value.
Use rowSums()
on the weighted matrix to get document-level valence scores.
or alternatively,
- File an issue and we will add this functionality to quanteda.
Note also that tidytext uses ANEW for its sentiment scoring, if you want to convert your dfm into their object and use that approach (which is basically a version of what I've suggested above).
Updated:
It turns out I already built the feature into quanteda that you need, and had simply not realised it!
This will work. First, load in the ANEW dictionary. (You have to supply the ANEW file yourself.)
# read in the ANEW data
df_anew <- read.delim("ANEW2010All.txt", stringsAsFactors = FALSE)
# construct a vector of weights with the term as the name
vector_anew <- df_anew$ValMn
names(vector_anew) <- df_anew$Word
Now that we have a named vector of weights, we can apply that using dfm_weight()
. Below, I've first normalised the dfm by relative frequency, so that the document aggregate score is not dependent on the document length in tokens. If you don't want that, just remove the line indicated below.
library("quanteda")
dfm_anew <- dfm(data_corpus_inaugural, select = df_anew$Word)
# weight by the ANEW weights
dfm_anew_weighted <- dfm_anew %>%
dfm_weight(scheme = "prop") %>% # remove if you don't want normalized scores
dfm_weight(weights = vector_anew)
## Warning message:
## dfm_weight(): ignoring 1,427 unmatched weight features
tail(dfm_anew_weighted)[, c("life", "day", "time")]
## Document-feature matrix of: 6 documents, 3 features (5.56% sparse).
## 6 x 3 sparse Matrix of class "dfm"
## features
## docs life day time
## 1997-Clinton 0.07393220 0.06772881 0.21600000
## 2001-Bush 0.10004587 0.06110092 0.09743119
## 2005-Bush 0.09380645 0.12890323 0.11990323
## 2009-Obama 0.06669725 0.10183486 0.09743119
## 2013-Obama 0.08047970 0 0.19594096
## 2017-Trump 0.06826291 0.12507042 0.04985915
# total scores
tail(rowSums(dfm_anew_weighted))
## 1997-Clinton 2001-Bush 2005-Bush 2009-Obama 2013-Obama 2017-Trump
## 5.942169 6.071918 6.300318 5.827410 6.050216 6.223944