3
votes

I am working on Named Entity Recognition. I evaluated libraries, such as MITIE, Stanford NER , NLTK NER etc., which are built upon conventional nlp techniques. I also looked at deep learning models such as word2vec and Glove vectors for representing words in vector space, they are interesting since they provide the information about the context of a word, but specifically for the task of NER, I think its not well suited. Since all these vector models create a vocab and corresponding vector representation. If any word failed to be in the vocabulary it will not be recognised. Assuming that it is highly likely that a named entity is not present since they are not bound by the language. It can be anything. So if any deep learning technique have to be useful in such cases are the ones which are more dependent on the structure of the sentence by using standard english vocab i.e. ignoring named fields. Is there any such model or method available? Will CNN or RNN may be the answer for it ?

1

1 Answers

0
votes

I think you mean texts of a certain language, but the named entities in that text may contain different names (e.g. from other languages)?

The first thing that comes to my mind is some semi-supervised learning techniques that the model is being updated periodically to reflect new vocabulary.

For example, you may want to use word2vec model to train the incoming data, and compare the word vector of possible NEs with existing NEs. Their cosine distance should be close.