change beam_width in spacy NER

Question

I would like to change the nlp.entity.cfg beam_width (by default it's 1) by 3.

I tried nlp.entity.cfg.update({beam_width : 3}) but it's look like that the nlp thing is broken after this change. (If I do a nlp(str), it will give me a dict instead of a spacy.tokens.doc.Doc like usual if I put beam_width : 1)

I want to change it because the probability of NER will be more accurate in my case (it's my own model that I trained). I did the probas with a code found in github.spacy/issues

with nlp.disable_pipes('ner'):
    doc = nlp(txt)

(beams, somethingelse) = nlp.entity.beam_parse([ doc ], beam_width, beam_density)

entity_scores = defaultdict(float)
for beam in beams:
    for score, ents in nlp.entity.moves.get_beam_parses(beam):
        for start, end, label in ents:
            entity_scores[(doc[start:end].text, label, start, end)] += score

beam_width : Number of alternate analyses to consider. More is slower, and not necessarily better -- you need to experiment on your problem. (by default : 1)

beam_density : This clips solutions at each step. We multiply the score of the top-ranked action by this value, and use the result as a threshold. This prevents the parser from exploring options that look very unlikely, saving a bit of efficiency. Accuracy may also improve, because we've trained on greedy objective. (by default : 0)

I'm sort a newb to NLP so I don't know what's Beam search with global objective and how to use it, so if you can explain me like I'm 5, it will be great !

I would like to be able to use displacy (style='ent') to visualize the entities with beam_width = 3.

Thanks for you answer, Hervé.

syllogism_ syllogism_ · Accepted Answer · 2018-09-13T16:38:13

(If I do a nlp(str), it will give me a dict instead of a spacy.tokens.doc.Doc like usual if I put beam_width : 1)

I'm not sure why that could be. Are you sure? What version are you using?

I just tried the following:

>>> import spacy
>>> nlp = spacy.load('en_core_web_md')
>>> nlp.entity.cfg['beam_width'] = 3
>>> doc = nlp(u'Hurrican Florence is approaching North Carolina.')
>>> doc.ents
(Hurrican Florence, North Carolina)
>>> nlp.entity.cfg['beam_width'] = 300
>>> doc = nlp(u'Hurrican Florence is approaching North Carolina.')
>>> doc.ents
(Hurrican Florence is approaching, North Carolina.)

As you can see, setting a very wide beam results in bad accuracy, because the default model isn't trained to use a wide beam like that.

As for the ELI5...Well, it's complicated :(. Sorry --- I don't have a simple explanation handy, which is one reason these are undocumented internals.

change beam_width in spacy NER

1 Answers