I trained Naive Bayes model with scikit-learn to classify articles in my web application.To avoid learning the model repeatedly, I want to save the model and deploy it to the application later. When i search for this problem, many people recommend the pickle
library.
I have this model :
import pickle
import os
def custom_tokenizer (doc) :
tokens = vect_tokenizer(doc)
return [lemmatizer.lemmatize(token) for token in tokens]
tfidf = TfidfVectorizer(tokenizer = custom_tokenizer,stop_words = "english")
clf = MultinomialNB()
I have already executed tfidf.fit_transform()
and trained clf
. Finally, i got a model and saved clf
classifier using this code :
dest = os.path.join('classifier','pkl_object')
f = open(os.path.join(dest,'classifier.pkl'),'wb')
pickle.dump(best_classifier,f,protocol = 4)
f.close()
I also tried to save my Vectorizer as a file this way.
f = open(os.path.join(dest,'vect.pkl'),'wb')
pickle.dump(custom_tokenizer,f,protocol = 4)
pickle.dump(best_vector,f,protocol = 4)
f.close()
There was no error. but when i tried to load the file, this error message popped up.
import pickle
import os
with open(os.path.join('pkl_object','classifier.pkl'),'rb') as file :
clf = pickle.load(file)
with open(os.path.join('pkl_vect','vect.pkl'),'rb') as file:
vect = pickle.load(file)
error message :
AttributeError Traceback (most recent call last)
<ipython-input-55-d4b562870a02> in <module>()
11
12 with open(os.path.join('pkl_vect','vect.pkl'),'rb') as file:
---> 13 vect = pickle.load(file)
14
15 '''
AttributeError: Can't get attribute 'custom_tokenizer' on <module '__main__'>
I think the pickle
library does not have the ability to store function properly. How can i serialize my custom TfidfVectorizer
as a file.
sklearn
are the same on both the machines. – pault