0
votes

Error Message:

Traceback (most recent call last): File "/Users/ABHINAV/Documents/test2.py", line 58, in classifier = NaiveBayesClassifier.train(trainfeats) File "/Library/Python/2.7/site-packages/nltk/classify/naivebayes.py", line 194, in train for featureset, label in labeled_featuresets: ValueError: too many values to unpack [Finished in 17.0s with exit code 1]

I am getting this error while I am trying to implement Naive Bayes on a set of data. Here is the code for that:

import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews

def word_feats(words):
    return dict([(word, True) for word in words])

negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]

negcutoff = len(negfeats)*3/4
poscutoff = len(posfeats)*3/4


trainfeats=[('good'),('pos'),
('quick'),('pos'),
('easy'),('pos'),
('big'),('pos'),
('iterested'),('pos'),
('important'),('pos'),
('new'),('pos'),
('patient'),('pos'),
('few'),('neg'),
('bad'),('neg'),

]

test=[
('general'),('pos'),
('many'),('pos'),
('efficient'),('pos'),
('great'),('pos'),
('interested'),('pos'),
('top'),('pos'),
('easy'),('pos'),
('big'),('pos'),
('new'),('pos'),
('wonderful'),('pos'),
('important'),('pos'),
('best'),('pos'),
('more'),('pos'),
('patient'),('pos'),
('last'),('pos'),
('worse'),('neg'),
('terrible'),('neg'),
('awful'),('neg'),
('bad'),('neg'),
('minimal'),('neg'),
('incomprehensible'),('neg'),
]

classifier = NaiveBayesClassifier.train(trainfeats)
print 'accuracy:', nltk.classify.util.accuracy(classifier, test)
classifier.show_most_informative_features()
2

2 Answers

2
votes

TLDR

You need to have this:

trainfeats=[('good','pos'),
('quick','pos'),
...

Instead of this:

trainfeats=[('good'),('pos'),
('quick'),('pos'),
...

Explanation

The crucial error is ValueError: too many values to unpack inside NaiveBayesClassifier.train which you call on this line:

classifier = NaiveBayesClassifier.train(trainfeats)

'Too many values to unpack' means the program is expecting a certain number of values inside an iterable, and it's receiving more than that number. For example, from your error message that error is thrown on this line:

for featureset, label in labeled_featuresets: 

This for loop expects pairs of things to be in 'labeled_featuresets', and it's going to assign one member of the pair to featureset, and one member to label. If labeled_featuresets actually has triplets, e.g. [(1,2,3), (1,2,3)...] then the program doesn't know what to do with that third element, so it throws the error.

Here's what you're passing into that function, which I assume is ending up as labeled_featuresets:

trainfeats=[('good'),('pos'),
('quick'),('pos'),
('easy'),('pos'),
...

It looks like you're trying to make a list of tuples (which would prevent the error you're getting) by indenting the items in that list as pairs, but that's not enough. Python won't use indentation to infer tuples, only brackets. I think this is what you're going for:

trainfeats=[('good','pos'),
('quick','pos'),
('easy','pos'),
...

That surrounds each pair with brackets, creating a list of tuples rather than a list of single elements.

0
votes

The trainfeat variable should be:

 trainfeats=[({'good':True,'quick':True,'easy':True,
'big':True,'interested':True,'important':True,
'new':True,'patient':True},'pos'),({'few':True,'bad':True},'neg')]

This is the correct format for a labelled feature set in nltk.

Similarly, the test variable should be:

test=[({'general':True,'many':True,'efficient':True,'great':True,'interested':True,'top':True,'easy':True,'big':True,'new':True,'wonderful':True,'important':True,'best':True,'more':True,'patient':True,'last':True},'pos'),({'worse':True,'terrible':True,'awful':True,'bad':True,'minimal':True,'incomprehensible':True},'neg')]