
I'm trying to retrieve documents in multiple directories and classify them. The NLTK book shows the example for categorizing files in two folders within the movie_reviews corpus, 'pos' and 'neg':

from nltk.corpus import movie_reviews
documents = [(list(movie_reviews.words(fileid)), category)
              for category in movie_reviews.categories()
              for fileid in movie_reviews.fileids(category)]

I attempted to do something similar for a couple of folders within the same directory:

reviews= "C:\Users\Alpine\Documents\Reviews" #Folders: Good, Bad
documents = [(list(reviews.words(fileid)), category)
              for category in reviews.categories()
              for fileid in reviews.fileids(category)]

However I get Attribute Error: 'str' object has no attribute 'categories' at for category in reviews.categories().

Is this method exclusive for files in the nltk corpus? Is there an alternative?


1 Answers


The problem is in confusing movie_reviews and reviews

movie_review is defined by importing from nltk.corpus and has a method words.

reviews is a variable to which you have assigned a string. And the string does not have a method words, as you were told by the error message.