after reading a lot of posts, I still have probs with making a custom corpus in nltk. I have a text file of tagged sentences, each item in the string of the form ... word/tag . I want to train a tagger using this stuff. I'm trying to use a nltk package called train-tagger which trains various types of taggers. 2 questions. 1) can train-tagger use a text file as input or only an nltk corpus object? 2) if only uses a corpus, how create one from a text file? I tried the following code to create a corpus...
import nltk
from nltk.corpus import PlaintextCorpusReader
corpus_root = './'
newcorpus = PlaintextCorpusReader(corpus_root, '.*')
print newcorpus.raw('IOBHarrisonsTraining.txt') .... this is my tagged text file
seems to work but I can't find the output. There is supposed to be a corpus created either in the folder this code runs from, or else in nltk_data/corpora but nothing found. Is there some method in the corpus module that is supposed to save the 'newcorpus' I created? which could then be used as inpupt to train-tagger? also, should I be using a tagged-sentence file as input to PlaintextCorpusReader or just an untagged set of sentences?