I'm trying to create a sentiment analysis tool to analyse tweets over a three day period about Manchester United football club and determine whether people view them positively or negatively. I am currently using this guide for guidance (with Java being my coding language)
http://cavajohn.blogspot.co.uk/2013/05/how-to-sentiment-analysis-of-tweets.html
I am using Apache Flume to download my tweets into Apache Hadoop and then am intending to use Apache Hive to query the tweets. I may also use Apache Oozie to partition the tweets effectively.
In the link I posted above, it is mentioned that I need to have a training dataset to train the classifier I will create to analyse the tweets. The sample classifier provided has some 5000 tweets. As I am doing this for a summer project for uni, I feel I should probably create my own dataset.
What is the minimum amount of tweets I should use to make this classifier effective? Is there a recommended number? For example, if I manually analysed a hundred tweets, or five hundred, or a thousand, would it be effective?