I am new to NLTK (http://www.nltk.org/), and python for that matter. I wish to use the NLTK python library, but use the BNC for the corpus. I do not believe this corpus is distributed through the NLTK Data download. Is there a way to import the BNC corpus to be used by NLTK. If so, how? I did find a function called BNCCorpusReader but have no idea how to use it. Also, at the BNC site, I was able to download the corpus (http://ota.ox.ac.uk/desc/2554).
Update
I have tried entrophy's suggestion, but get the following error:
raise IOError('No such file or directory: %r' % _path)
OSError: No such file or directory: 'C:\\Users\\jason\\Documents\\NetBeansProjects\\DemoCollocations\\src\\Corpora\\bnc\\A\\A0\\A00.xml'
My code to read in the corpora:
bnc_reader = BNCCorpusReader(root="Corpora/bnc", fileids=r'[A-K]/\w*/\w*\.xml')
And by corpora is located in: C:\Users\jason\Documents\NetBeansProjects\DemoCollocations\src\Corpora\bnc\