I have taken a look and try out the scikit-learn's tutorial on its Multinomial naive bayes classifier.
I want to use it to classify text documents, and the catch about the NB is that it treats its P(document|label) as a product of all its independent features (words). Right now, I need to try out doing 3 trigram classifier whereby the P(document|label) = P(wordX|wordX-1,wordX-2,label) * P(wordX-1|wordX-2,wordX-3, label).
Where scikit learn supports anything I can implement this language model and extend the NB classifier to perform classification based on this?