I am trying to understand the machine learning part behind Google's Smart Linkify. The article states the following regarding their generate candidate entities model.
A given input text is first split into words (based on space separation), then all possible word subsequences of certain maximum length (15 words in our case) are generated, and for each candidate the scoring neural net assigns a value (between 0 and 1) based on whether it represents a valid entity:
Next, the generated entities that overlap are removed, favoring the ones with the higher score over the conflicting ones with a lower score.
If I understand correctly the model tries every word in the sentence and a combination of that word up to 15 words total?
How can you train such model? I assume it's supervised learning but don't understand how such data could be labeled. Is it similar to NER where the entity is specified by character position? And there are only 2 entities in the data entity and non-entity.
And for the output of the model, the so called "candidate score", how can a a neural network return a single numerical value? (the score). Or is the output layer just a single node?
A more detailed explanation on:
Possible word subsequences of certain maximum lengthmeans it considers every word with the 7 words before and 7 after the word?- How can the neural net generate a score when its a binary classification
entityandnon-entity? Or do they mean the probability score for entity? - How to train a binary NER? Like any other NER except replace all entities to type 'entity' and then generate negative samples for
non-entity? - How can this model be fast, as they claim, when it processes every word in the text plus 7 words before and after said word?
is what I'm looking for, to understand.
