Named Entity Recognition Data and Features

Question

I am building a Named Entity Recognizer with a Conditional Random Field and am looking for two things:

A) An open source, English NER dataset for Person, Location, and Organization entities

B) A list of English NER features

I have already looked at the CoNLL-2003 corpus and found this is exactly what I want but it is not readily available. I have been unsuccessful in finding a list of NER features; I am trying to avoid having to hand design these features.

Thanks

So I take it you're looking for something free, right? :) I think there might be a few on this list that could help: cs.technion.ac.il/~gabr/resources/data/ne_datasets.html — dmn

eldams eldams · Accepted Answer · 2013-12-15T21:55:22

You'll find a summarized and very informative study of what is needed for NER in this paper from Ratinov & Roth. In addition, their system is completely open-source, and includes lists of named entities gathered from Wikipedia.

Named Entity Recognition Data and Features

2 Answers