1
votes

I am getting error while I directly passing dataframe column into stopword. How Can I resolve this

    stop_words_corpus=pd.DataFrame(word_dictionary_corpus.Word.unique(),columns=feature_names)

cv = CountVectorizer( max_features = 200,analyzer='word',stop_words= stop_words_corpus) 
cv_txt = cv.fit_transform(data.pop('Clean_addr'))

****Updated Error***

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y) 867 868 vocabulary, X = self._count_vocab(raw_documents, --> 869 self.fixed_vocabulary_) 870 871 if self.binary:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in _count_vocab(self, raw_documents, fixed_vocab)
    783             vocabulary.default_factory = vocabulary.__len__
    784 
--> 785         analyze = self.build_analyzer()
    786         j_indices = []
    787         indptr = _make_int_array()

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in build_analyzer(self)
    260 
    261         elif self.analyzer == 'word':
--> 262             stop_words = self.get_stop_words()
    263             tokenize = self.build_tokenizer()
    264 


I fixed the error taht error still having the issue
2

2 Answers

1
votes

Try this:

cv = CountVectorizer(max_features = 200,
                     analyzer='word',
                     stop_words=stop_words_corpus.stack().unique())
0
votes

We need to make the dataframe into NpArray to pass stopwords in to the countvectorizer

stop_word =stop_words_corpus['Word'].values

cv = CountVectorizer(max_features = 200,
                     analyzer='word',
                     stop_words=stop_word)