I am currently working on sentimental analysis of twitter data for one of telecom company data.I am loading the data into HDFS and using Mahout's Naive Bayes Classifier for predicting the sentiments as positive,negative or neutral .
Here's is what i am doing
I am providing training data to the machine (key :sentiment,value:text) .
Using mahout library by calculating tf-idf(Inverse Document Frequency) of text it is creating feature vector.
mahout seq2sparser -i /user/root/new_model/dataseq --maxDFPercent 1000000 --minSupport 4 --maxNGramSize 2 -a org.apache.lucene.analysis.WhitespaceAnalyzer -o /user/root/new_model/predicted
Splitting data as training set and testing set.
That feature vector I am passing to the naive Bayes algorithm to build a model.
mahout trainnb -i /user/root/new_model/train-vectors -el -li /user/root/new_model/labelindex -o /user/root/new_model/model -ow -c
- Using this model I am predicting sentiment of new data.
This is very simple implementation what I am doing , By this implementation I am getting very low accuracy even if i have good training set . So I was thinking of switching to Logistic regression/SVM because they give better results for these kind of problem .
So my question how can i use these algorithm for building my model or predicting the sentiments of tweets using these two algorithms . What steps i need to follow to achieve this ?