2
votes

I know Apache OpenNLP uses MaxEnt model for its NER tagger. But what features Apache OpenNLP does use (by default) while running its named entity recognition (NER) models? and also how can we incorporate/customize new features in OpenNLP (Java implementation)?

1

1 Answers

1
votes

In Apache OpenNLP NER, it allows users to define features via XML file. The default XML is this:

https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/resources/opennlp/tools/namefind/ner-default-features.xml

If you want to customize it, use -featuregen option when you train the model:

$ opennlp TokenNameFinderTrainer -featuregen your-features-definition.xml -model my-model.bin ...

You don't need to specify your customized feature XML file when you execute TokenNameFinder as the model file includes the information of your features.