Need some guidance related to sentiment analysis on tweets related to music on spark.
I was trying to perform sentiment analysis on twitter data for tweets related to music. After a lot of searching around the net, I have understood how to fetch the tweets using 'tweepy' python api and also realized that I can use 'Naive Bayes classifier' to finally classify the tweets. Now I am confused regarding how to define features for this classification, I am supposed to define at least 500 features. So here are my questions. I do not want to use any already available API like 'textblob' to find the sentiment of a tweet.
1) Can anyone give some examples of features that we can use for classifying music related tweets ? [ can we use tweets with a happy smiley as positive training set ? if so are the words in those tweets features for my classifier ?]
2) How do we generate the training set for this classifier?
3) If I want to filter the tweets for music related tweets, can I use Bloom Filter to achieve it ?
4) What is the size of data I can get through tweepy api ?
Please correct me if there is something wrong with my understanding.