I am developing a news classification system where a particular news item is assigned to an organization or company name. For instance a news item labelled "Apple to launch new iPhone in september 2012" gets categorized in "Apple" news. So far, after training the classifier with a bunch of topics such as Apple news, Google news, Microsoft news, Samsung news, Bank of America news etc worked perfect and I was getting almost 99% correctly classified instances from a single trained model. Now the problem is to classify a news such as "Samsung and Google prep attack against Apple" into three topics, "Apple", "Samsung" and "Google".
My question over here is how can I use Mahouts classification to classify a single item into multiple classes. I saw a similar question in this thread http://mail-archives.apache.org/mod_mbox/mahout-user/201206.mbox/%[email protected]%3E.
Ted Dunning gave an interesting answer as to make seperate category for multiple topics, but in my case the combinations are many. I have to classify news into almost 15,000 companies and realistically speaking any news can be a mixture of any of the 15000 companies. So the making of combinations as a separate category is ruled out!. A second suggestion was to arrange topics in a hierarchy which also does not apply over here as the company names doesn't converge to any base category.
Having 15000 models for 15000 topics would do it, but does not sound very plausible too!
So what should be the correct way for classifiying multi topic news then?
Thanks!