I'm using NER to essentially scrub text so that each named entity is replaced with its label (PERSON, ORG, etc.). So "John works at Apple" would become "PERSON works at ORG."
clause_text is my list of sentences. I used the ner-d package to build my NER model and scrub text as follows:
for text in clause_text:
input_text = text
doc = ner.name(input_text, language='en_core_web_sm')
text_label = [(X.text, X.label_) for X in doc]
# replace all named entities with their label (PERSON, ORG, etc)
for text, label in text_label:
input_text = input_text.replace(text, label)
scrubbed_text.append(input_text)
Now, I am trying to add custom training data. Basically I want to be able to add a sentence with labels and update the NER model to make it more accurate/specific to what I need it to do. Right now I have this:
nlp = spacy.load('en_core_web_sm')
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner)
else:
ner = nlp.get_pipe('ner')
from spacy.gold import GoldParse
from spacy.pipeline import EntityRecognizer
doc_list = []
doc = nlp('This EULA stipulates a contract for Hamilton Enterprises.')
doc_list.append(doc)
gold_list = []
gold_list.append(GoldParse(doc, [u'O', u'O', u'O', u'O', u'O', u'O', u'ORG']))
ner = EntityRecognizer(nlp.vocab, entity_types = ['ORG'])
ner.update(doc_list, gold_list)
But when I run this, I get this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-92c53f5c90b1> in <module>
9
10 ner = EntityRecognizer(nlp.vocab, entity_types = ['ORG'])
---> 11 ner.update(doc_list, gold_list)
nn_parser.pyx in spacy.syntax.nn_parser.Parser.update()
nn_parser.pyx in spacy.syntax.nn_parser.Parser.require_model()
ValueError: [E109] Model for component 'ner' not initialized. Did you forget to load a model, or forget to call begin_training()?
Does anyone have any insight on how to best fix this code, or if there's a better way to add custom entries to update the NER model? Thanks so much!