Some months ago, I opened an issue on GitHub about this topic. It is possible to add the respective code to the current master branch of scikit-learn.
The user @larsmans added an experimental class SemisupervisedNB
to the file sklearn/naive_bayes.py
around a year ago. This code resides in the branch emnb
of his forked scikit-learn repository and can be accessed here.
The essential code resides in two files:
The file naive_bayes.py
in the current master branch has to be replaced by the older one from the emnb
branch.
An editing of the class LabelBinarizer
is necessary which can be found in the file sklearn/preprocessing.py
in the master branch. The entire class has to be replaced by its definition in @larsmans' emnb
branch. There, it resides in the file sklearn/preprocessing/__init__.py
.
Even though the code for the Naive Bayes classifiers have not changed a lot for a year, some bug fixes were added to them. Therefore it makes sense to keep the current versions of the file naive_bayes.py
and the class LabelBinarizer
and instead to give the experimental versions different names.
I've just created my own fork of the scikit-learn repository and added the experimental files on top of the current stable branch 0.13.X
. This branch is called 0.13.X-emnb
and can be accessed here. If you look at my three recent commits (1 and 2 and 3), you see which files I've changed and newly created.
Since SemisupervisedNB
does not work together with the most recent versions of the other classifiers, I've just added a new module next to naive_bayes.py
called semisupervised_naive_bayes.py
. In there, you find the older versions of the classifiers in renamed versions, e.g. SemiMultinomialNB
instead of MultinomialNB
so that they don't clash with the most recent versions of the classifiers. Likewise, I've added a class SemisupervisedLabelBinarizer
next to LabelBinarizer
(the choice of the name is a bit unfortunate but at least it's clear what it should be used for).
So, if you want to use the semisupervised versions of the classifiers, use the module sklearn.semisupervised_naive_bayes
. For the current versions, use the module sklearn.naive_bayes
.
But please keep in mind that this is highly experimental. It's just a setting for getting this old code working. I haven't searched for bugs.
SemiSupervisedNB
working. However, I haven't tested so far whether the described changes to the master branch affect other classifiers or code. Try it with caution! – pemistahl