0
votes

In the documentation of the 'Named Entity Recognition' feature of spaCy (https://spacy.io/usage/linguistic-features#named-entities) the documentation states that spaCy can recognize 'various types' of named entities such as 'PERSON', 'LOC', 'PRODUCT' (https://spacy.io/api/annotation#named-entities).

My question is: can I also train data with my custom entities? For example I would like to train invoice data to regognize for example IBAN / BIC or an invoice no. . Is this also possible or is this feature restricted to a fixed list of entities only?

1

1 Answers

1
votes

It does support custom entities, cf this section titled "Training an additional entity type".

For example, to add a label called MY_ANIMAL, you can use training data like such:

TRAIN_DATA = [
    (
        "Horses are too tall and they pretend to care about your feelings",
        {"entities": [(0, 6, MY_ANIMAL)]},
    ),
    ("Do they bite?", {"entities": []}),
    (
        "horses are too tall and they pretend to care about your feelings",
        {"entities": [(0, 6, MY_ANIMAL)]},
    ),
]

And feed that into either an existing NER model as additional training, or a newly created NER pipe.

However, a caveat: the ML model is optimized for recognizing named entities, which are usually capitalized nouns like "John", "London" or "The Times". You can also try to train it on more generic things like numbers, but it may not work as well.