Below my situation.
I have a class TextProcessor that process a text. I need to find the coreferences in such a text and then extract the informations with the Stanford's tool OpenIE. I use this two pipelines:
"tokenize,ssplit,pos,lemma,ner,parse,mention,coref" for coreferences.
and
"tokenize,ssplit,pos,lemma,depparse,natlog,openie" for Information Extraction.
It requires lot of time to use them separately for analyzing a single text, but for the moment I have to do so cause using them together requires a large amount of memory and the pipeline would exeed my memory's bounds.
public class TextProcessor(){
Properties props;
StanfordCoreNLP pipeline;
public TextProcessor() {
props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,mention,coref");
pipeline = new StanfordCoreNLP(props);
}
// Performs NER and COREF
public void process(String text) {
Annotation document = new Annotation(malware.getDescription());
pipeline.annotate(document);
// Process text (tokenization, pos, lemma, ner, coref)....
}
public void extractInformation(String document) {
props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,depparse,natlog,openie");
pipeline = new StanfordCoreNLP(props);
Annotation doc = new Annotation(document);
pipeline.annotate(doc);
// Extract informations from doc ...
}
Is there a way to put together the two pipelines dynamically? I mean, something like this:
1) "tokenize,ssplit,pos,lemma,ner,depparse,mention,coref"
2) "tokenize,ssplit,pos,lemma,ner,depparse,mention,coref,natlog,openie".
I tried to return an Annotation object from the first method process(String text) and then add the other three properties to it in the method extractInformation(String text), like this:
public Annotation process(String text) {
Annotation document = new Annotation(malware.getDescription());
pipeline.annotate(document);
// Process text (tokenization, pos, lemma, ner, coref)....
return document;
}
public void extractInformation(Annotation document) {
props.setProperty("annotators","depparse,natlog,openie");
pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
// Extract informations from doc ...
}
But I get this error:
annotator "depparse" requires annotation "TextAnnotation". The usual requirements for this annotator are: tokenize,ssplit,pos.
I thought that adding the new three properties (depparse, natlog, openie) to an already annotated document (with tokenize,ssplit,pos) would work, but it didn't.
So, is there a way to add those properties to the oldest pipeline avoiding to perform again all the pipeline (plus the new properties) and avoid the memory to exceed its bounds?
UPDATE
All I needed to do was
public Annotation process(String text) {
Annotation document = new Annotation(malware.getDescription());
pipeline.annotate(document);
// Process text (tokenization, pos, lemma, ner, coref)....
StanfordCoreNLP.clearAnnotatorPool(); // <-- Added: to get rid of the models and solve the memory issue
return document;
}
public void extractInformation(Annotation document) {
props.setProperty("annotators","natlog,openie");
props.setProperty("enforceRequirements", "false") //<-- Added
pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
// Extract informations from doc ...
}
Alternatively, you can use:
pipeline = new StanfordCoreNLP(props, false);
in extractInformation(Annotation document).