I’m trying to do sentiment analysis in Quanteda and have encountered an error I can’t solve using the 2015 Lexicoder Sentiment Dictionary. The dictionary has four keys: negative, positive, negative positive ( positive word preceded by a negation (used to convey negative sentiment) and, negative negative ( a negative word preceded by a negation, used to convey positive sentiment ).
I can’t get the final two categories to activate when I use the dictionary.
Here is the script I’m using
The package LexisNexisTools converts it in to a quanteda corpus. When I was experimenting with the error, I wasn’t getting any neg_pos
or neg_negative
hits, so I added the example sentence “This aggressive policy will not win friends” - which has one neg_positive
bigram ('will not') - from the reference on the quanteda page to the first line of first document. This is registered in the first dfm and can be seen in the toks_dict
tokens list. However, there are more instances of the exact same bigram (will not) in the corpus that are not counted. Moreover, there are other neg_pos
and neg_neg
phrases in the corpus which are not registered at all.
I’m not sure how this is resolved at all. Curiously, in the third dfm dfm_dict
, the initial ‘will not’ is not registered as a neg_positive
at all. The overall counts for the categories negative
and positive
are not changed, so this isn’t a case of the missing values being counted elsewhere. I’m really at a miss on what I’m doing wrong - any help would be greatly appreciated!
rm(list=ls())
library(quanteda)
library(quanteda.corpora)
library(readtext)
library(LexisNexisTools)
library(tidyverse)
library(RColorBrewer)
LNToutput <-lnt_read("word_labour.docx")
corp <- lnt_convert(LNToutput, to = "quanteda")
#uses the package lexisnexistools to create the corpus from the format needed
dfm <- dfm(corp, dictionary = data_dictionary_LSD2015)
dfm
toks_dict <- tokens_lookup(tokens(corp), dictionary = data_dictionary_LSD2015, exclusive= FALSE )
toks_dict
dfm_dict <- dfm(toks_dict, dictionary = data_dictionary_LSD2015, exclusive = FALSE )
dfm_dict
https://www.dropbox.com/s/qdwetdn8bt9fdrd/word_labour.DOCX?dl=0
This is a link to the word document that forms the raw text for the corpus.