I have a dataset with social media posts that looks as below, but its in Farsi and I could not find a readily available R/Python sentiment analysis package.
post/tweet
"we are tired of this regime and need to make a change happen now"
Ideally, I want to classify each statement as having a negative, positive, or neutral sentiment. Therefore, I created a small dictionary where I classify words into either negative or positive ones.
library(tidyverse)
library(stringr)
library(readxl)
#install.packages("tidyverse")
#install.packages("stringr")
#install.packages("readxl")
With three categories: positive, negative, and neutral
Raw_data_on_posts %>% mutate(p_count = str_count(post, str_c(Dictionary$positive, collapse = '|')),
n_count = str_count(post, str_c(Dictionary$negative, collapse = '|'))) %>%
mutate(label = case_when(p_count > n_count ~ 'positive',
p_count < n_count ~ 'negative',
TRUE ~ 'neutral')) %>% select(post, label)
I ended up having most statements as neutral, although based on my reading of the social media posts are either pro or anti the Iranian regime. Specifically, I believe this occurred because it is classifying words that I neither classified as negative nor positive, as neutral. But is it possible to instead only compare whether a statement has more negative or positive words?
post/tweet sentiment
"we are tired of this regime neutral
and need to make a change happen now"
Furthermore, I wonder if it makes sense to instead use a binary, rather than multi-class sentiment in this case?