1
votes

I am trying to train a custom ner model using spacy. Currently, I have more than 2k records for training and each text consists of more than 100 words, at least more than 2 entities for each record. I running it for 50 iterations. It is taking more than 2 hours to train completely.

Is there any way to train using multiprocessing? Will it improve the training time?

1
I'm not sure if this can be done or not, but I want to mention that (it's somehow related to your question): spacy.io/usage/training#tips - colidyre

1 Answers

2
votes

Short answer... probably not

It's very unlikely that you will be able to get this to work for a few reasons:

  • The network being trained is performing iterative optimization
    • Without knowing the results from the batch before, the next batch cannot be optimized
  • There is only a single network
    • Any parallel training would be creating divergent networks...
    • ...which you would then somehow have to merge

Long answer... there's plenty you can do!

There are a few different things you can try however:

  • Get GPU training working if you haven't
    • It's a pain, but can speed up training time a bit
    • It will dramatically lower CPU usage however
  • Try to use spaCy command line tools
    • The JSON format is a pain to produce but...
    • The benefit is you get a well optimised algorithm written by the experts
    • It can have dramatically faster / better results than hand crafted methods
  • If you have different entities, you can train multiple specialised networks
    • Each of these may train faster
    • These networks could be done in parallel to each other (CPU permitting)
  • Optimise your python and experiment with parameters
    • Speed and quality is very dependent on parameter tweaking (batch size, repetitions etc.)
    • Your python implementation providing the batches (make sure this is top notch)
  • Pre-process your examples
    • spaCy NER extraction requires a surprisingly small amount of context to work
    • You could try pre-processing your snippets to contain 10 or 15 surrounding words and see how your time and accuracy fairs

Final thoughts... when is your network "done"?

I have trained networks with many entities on thousands of examples longer than specified and the long and short is, sometimes it takes time.

However 90% of the increase in performance is captured in the first 10% of training.

  • Do you need to wait for 50 batches?
  • ... or are you looking for a specific level of performance?

If you monitor the quality every X batches, you can bail out when you hit a pre-defined level of quality.

You can also keep old networks you have trained on previous batches and then "top them up" with new training to get to a level of performance you couldn't by starting from scratch in the same time.

Good luck!