2
votes

I trained Naive Bayes model with scikit-learn to classify articles in my web application.To avoid learning the model repeatedly, I want to save the model and deploy it to the application later. When i search for this problem, many people recommend the pickle library.

I have this model :

import pickle
import os
def custom_tokenizer (doc) :
    tokens = vect_tokenizer(doc)
    return [lemmatizer.lemmatize(token) for token in tokens]

tfidf = TfidfVectorizer(tokenizer = custom_tokenizer,stop_words = "english")
clf = MultinomialNB()

I have already executed tfidf.fit_transform() and trained clf. Finally, i got a model and saved clf classifier using this code :

dest = os.path.join('classifier','pkl_object')
f = open(os.path.join(dest,'classifier.pkl'),'wb')
pickle.dump(best_classifier,f,protocol = 4)
f.close()

I also tried to save my Vectorizer as a file this way.

f =  open(os.path.join(dest,'vect.pkl'),'wb')
pickle.dump(custom_tokenizer,f,protocol = 4)
pickle.dump(best_vector,f,protocol = 4)
f.close()

There was no error. but when i tried to load the file, this error message popped up.

import pickle
import os

with open(os.path.join('pkl_object','classifier.pkl'),'rb') as file :
    clf = pickle.load(file)

with open(os.path.join('pkl_vect','vect.pkl'),'rb') as file:
    vect = pickle.load(file)

error message :

AttributeError                            Traceback (most recent call last)
<ipython-input-55-d4b562870a02> in <module>()
     11 
     12 with open(os.path.join('pkl_vect','vect.pkl'),'rb') as file:
---> 13     vect = pickle.load(file)
     14 
     15 '''

AttributeError: Can't get attribute 'custom_tokenizer' on <module '__main__'>

I think the pickle library does not have the ability to store function properly. How can i serialize my custom TfidfVectorizer as a file.

1
Is this on the same computer? If not, verify that versions of sklearn are the same on both the machines.pault
@pault These are on the same computer.Antenna_
In the file where you are loading the pickle from, have you defined custom_tokenizer? Functions need to be defined for the pickle to load properly, it needs to be in the global scope in your case, too.Robert F. Dickerson

1 Answers

2
votes

In the second program also include:

def custom_tokenizer (doc) :
    tokens = vect_tokenizer(doc)
    return [lemmatizer.lemmatize(token) for token in tokens]

becuase pickle doesn't actually store information about how a class/object is constructed, as this line in your error log says AttributeError: Can't get attribute 'custom_tokenizer' on <module '__main__'> it has no idea what is custom_tokenizer.Refer this for better understanding.