I have created my own corpus, similar to the movie_reviews corpus in nltk (categorized by neg|pos.)
Within the neg and pos folders are txt files.
Code:
from nltk.corpus import CategorizedPlaintextCorpusReader
mr = CategorizedPlaintextCorpusReader('C:\mycorpus', r'(?!\.).*\.txt',
cat_pattern=r'(neg|pos)/.*')
When I try to read or interact with one of these files, I am unable to.
e.g. len(mr.categories())
runs, but does not return anything:
>>>
I have read multiple documents and questions on here regarding custom categorized corpus', but I am still unable to use them.
Full code:
import nltk
from nltk.corpus import CategorizedPlaintextCorpusReader
mr = CategorizedPlaintextCorpusReader('C:\mycorpus', r'(?!\.).*\.txt',
cat_pattern=r'(neg|pos)/.*')
len(mr.categories())
I eventually want to be able to preform a naive bayes algorithm against my data but I am unable to read the content.
Paths:
C:\mycorpus\pos
C:\mycorpus\neg
Within the pos file is a 'cv.txt' and the neg contains a 'example.txt'