1
votes

Is it possible (and how if it is) to dynamically train sklearn MultinomialNB Classifier? I would like to train(update) my spam classifier every time I feed an email in it.

I want this (does not work):

x_train, x_test, y_train, y_test = tts(features, labels, test_size=0.2)
clf = MultinomialNB()
for i in range(len(x_train)):
    clf.fit([x_train[i]], [y_train[i]])
preds = clf.predict(x_test)

to have similar result as this (works OK):

x_train, x_test, y_train, y_test = tts(features, labels, test_size=0.2)
clf = MultinomialNB()
clf.fit(x_train, y_train)
preds = clf.predict(x_test)
1

1 Answers

2
votes

Scikit-learn supports incremental learning for multiple algorithms, including MultinomialNB. Check the docs here

You'll need to use the method partial_fit() instead of fit(), so your example code would look like:

x_train, x_test, y_train, y_test = tts(features, labels, test_size=0.2)
clf = MultinomialNB()
for i in range(len(x_train)):
    if i == 0:
        clf.partial_fit([x_train[i]], [y_train[I]], classes=numpy.unique(y_train))
    else:
        clf.partial_fit([x_train[i]], [y_train[I]])
preds = clf.predict(x_test)

Edit: added the classes argument to partial_fit, as suggested by @BobWazowski