re-train stanford nlp pos tagger in eclipse

Question

i'm pretty new to nlp and i'm trying to figure out pos taggings. currently i'm trying out the stanford nlp pos tagger, url: http://nlp.stanford.edu/software/tagger.shtml

from the link above, there's this sentence:

The tagger can be retrained on any language, given POS-annotated training text for the language.

However, I'm not able to get it working. All I can do now is to give it a text file to tag. for e.g. String test = "this is a test"; will return me this_DT is_VBZ a_DT test_NN.

How can I go about retraining the tagger? Let's say I want the above string to be returned as this_DT is_VBZ a_DT test_VB?

appreciate any answers here.

Well in the above example, 'test' is a noun, making that the correct tagging. Do you mean training to differentiate between noun and verb? — hacket
hi hacket, thanks for the reply. no i do not mean to differentiate between noun and verb. simply put, how do i re-train the tagger if the output tags is not the one i wanted? — user1694345

Lee Becker Lee Becker · Accepted Answer · 2013-10-07T18:42:36

Unless you have a POS tagged corpus with many examples of the phenomenon/phenomena you are looking to correct, you will likely not have any success retraining the tagger models. To clarify, based on how I expect the Stanford tools do training, there is no mechanism to add single examples to alter the models. You will need to have a complete corpus and retrain anew.

If you do indeed have a corpus, then I would refer to this previously posted question to get details on the file format and proper steps for training the Stanford CoreNLP models.

Otherwise, you best bet is to write some post-processing rules / regex patterns that override the behavior. A use for such rules is to ensure people and places in a wordlist are tagged as proper nouns (NNP).

Good luck!

re-train stanford nlp pos tagger in eclipse

1 Answers