I'm trying to retrieve documents in multiple directories and classify them. The NLTK book shows the example for categorizing files in two folders within the movie_reviews corpus, 'pos' and 'neg':
from nltk.corpus import movie_reviews
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
I attempted to do something similar for a couple of folders within the same directory:
reviews= "C:\Users\Alpine\Documents\Reviews" #Folders: Good, Bad
documents = [(list(reviews.words(fileid)), category)
for category in reviews.categories()
for fileid in reviews.fileids(category)]
However I get Attribute Error: 'str' object has no attribute 'categories'
at for category in reviews.categories()
.
Is this method exclusive for files in the nltk corpus? Is there an alternative?