0
votes

I use Stanford Core NLP API; My codes are Below:

public class StanfordCoreNLPTool {


   public static StanfordCoreNLPTool instance;
   private Annotation annotation = new Annotation();
   private Properties props = new Properties();
   private PrintWriter out = new PrintWriter(System.out);;

   private StanfordCoreNLPTool(){}

   public void startPipeLine(String question){
      props = new Properties();
      props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse,mention,
      dcoref, sentiment");
      annotation = new Annotation(question);
      StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
       // run all the selected Annotators on this text
      pipeline.annotate(annotation);
      pipeline.prettyPrint(annotation, out);
   }

   public static StanfordCoreNLPTool getInstance() {
      if(instance == null){
        synchronized (StanfordCoreNLPTool.class){
            if(instance == null){
                instance = new StanfordCoreNLPTool();
            }
        }
      }
      return instance;
   }
}

It works fine, but it takes much time; Consider we are using it in a question answering system, so for every new input, pipeAnnotation must be run. As you know, each time some rules should be fetched, some data trained and etc to yield to an sentence with NLP tags such as POS, NER and ... .

first of all, i wanted to solve the problem with RMI and EJB, but it failed, because, Regardless Of any JAVA Architecture, for every new sentence, pipeAnnotation should be learnt from the scratch. look at this log that are printed in my intellij console:

Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [6.1 sec].

Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [8.0 sec].

Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [8.7 sec].

Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [5.0 sec].

INFO: Read 25 rules [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse [main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [4.1 sec].

please help me to find a solution to make the program fast.

1
Making the program faster doesn't appear to have anything to do with RMI or EJBs.user207421
I meant, make a program indepently for pipeannotation, and another program get nlp tags by remote method invokation...Elmira Khodaee

1 Answers

1
votes

The big problem here is that your startAnnotation() method combines two things that should be separated:

  1. Constructing an annotation pipeline, including loading large annotator models
  2. Annotating a particular piece of text (a question) and printing the result

For making things fast, these two things must be separated, as loading an annotation pipeline is slow but should be done only once, while annotating each piece of text is reasonably fast.

The rest is all minor details but FWIW:

  • All the fanciness of the double-checked locking constructing a StanfordCoreNLPTool singleton is doing nothing if you create a new one for every piece of text you annotate. You should construct an annotation pipeline only once, and it might be reasonable to do that as a singleton, but it's probably sufficient to do the initialization in the constructor once you distinguish pipeline construction from text annotation.
  • The annotation variable can and should be private to the method that annotates one piece of text.
  • If after these changes you still want model loading to be faster, you could buy an SSD! (My 2012 MBP with an SSD loads models more than 5 times faster than you are reporting.)
  • If you want to further speed annotation, the main tool is to cut out annotators you don't need or to choose faster versions. E.g., if you're not using coreference, you could delete mention and dcoref.