I'm trying to apply SVM from Scikit learn to classify the tweets I collected. So, there will be two categories, name them A and B. For now, I have all the tweets categorized in two text file, 'A.txt' and 'B.txt'. However, I'm not sure what type of data inputs the Scikit Learn SVM is asking for. I have a dictionary with labels (A and B) as its keys and a dictionary of features (unigrams) and their frequencies as values. Sorry, I'm really new to machine learning and not sure what I should do to get the SVM work. And I found that SVM is using numpy.ndarray as the type of its data input. Do I need to create one based on my own data? Should it be something like this?
Labels features frequency
A 'book' 54
B 'movies' 32
Any help is appreciated.