0
votes
word_vectorizer = CountVectorizer(ngram_range=(2,2), analyzer='word')
for each in (train_incidents_word_issue["Summary"].index):
    text_issue_list = [data_word_issue["Summary"][each]]
    sparse_matrix = word_vectorizer.fit_transform(text_issue_list)
    frequencies = sum(sparse_matrix).toarray()[0]
    bi_grams_issue_df = pd.DataFrame(frequencies, index=word_vectorizer.get_feature_names(), columns=['frequency'])
    data_word_issue["data_issue_count"][each] = bi_grams_issue_df[bi_grams_issue_df.index.str.contains("^data issue$")]["frequency"].sum()

I am getting the below error:

ValueError in 5 for each in (train_incidents_word_issue["Summary"].index): 6 text_issue_list = [data_word_issue["Summary"][each]] ----> 7 sparse_matrix = word_vectorizer.fit_transform(text_issue_list) 8 frequencies = sum(sparse_matrix).toarray()[0] 9 bi_grams_issue_df = pd.DataFrame(frequencies, index=word_vectorizer.get_feature_names(), >columns=['frequency'])

ValueError: Empty vocabulary; perhaps the documents only contain stop words>

Help me understand the error and recommended solution...i have just started with python

1
Used the below code word_vectorizer.fit_transform(text_issue_list.split('\n')) Getting the below error AttributeError: 'list' object has no attribute 'split'bish

1 Answers

0
votes

Try using this for your vocabulary

word_vectorizer.fit_transform(text_issue_list.split('\n'))