Actually I am working on Naive Bayes classifier for filtering mails. I have achieved accuracy of 95% in SPAM detection and 94 % in HAM detection, but I believe it can be further improved with association rule mining. I am calculating likelihood and prior probabilities of the words in mails from training data set, and mapping the testing mail to either of SPAM or HAM class as given below,
where,
p(d/c) denotes probability of document d being in class c.
p(c) denotes probability of particular class(SPAM or HAM in my case).
p(f1,f2,f3...fn/c) denotes likelihood of words f1,f2...fn being in class c.
but while arriving at equation no. 2.7, we assume bag of words assumption and condition independence, which approximates accuracy( Which is assumed for sake of easiness).
e.g. Likeliness of word lottery being in SPAM mail with presence of word lucky should be greater than the same with presence of word my_name(mahesh). So presence of words and their position do affect probabilities.
Therefore there should be some associative model in accordance with Naive Bayes to further improve accuracy.