3
votes

I want to use stanford parser within the coreNLP. I already got this example working:

http://stanfordnlp.github.io/CoreNLP/simple.html

BUT: I need the german model. So i downloaded "stanford-german-2016-01-19-models.jar".

But how can I set this jar file for usage? I only found:

LexicalizedParser lp = LexicalizedParser.loadModel("englishPCFG.ser.gz");

but i have a jar with the germn models, NOT a ...ser.gz.

Can anyboady help?

3
I would assume the jar contains the data, and you would add the jar to the build path of your project to access it, no? - 2ARSJcdocuyVu7LfjUnB
You re right. Of course, I already added the german .jar file to my build path in Eclipse. But there must be an option where I have to set this german file. If not, how can the program know which language should it use. - Tobi123
Edit: Of course I can also use a german sentence for input, but the result tags are wrong / don't make sense. - Tobi123

3 Answers

4
votes

Here is some sample code for parsing a German sentence:

import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.simple.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.CoreMap;
import edu.stanford.nlp.util.PropertiesUtils;
import edu.stanford.nlp.util.StringUtils;

import java.util.*;

public class SimpleGermanExample {

    public static void main(String[] args) {
        String sampleGermanText = "...";
        Annotation germanAnnotation = new Annotation(sampleGermanText);
        Properties germanProperties = StringUtils.argsToProperties(
                new String[]{"-props", "StanfordCoreNLP-german.properties"});
        StanfordCoreNLP pipeline = new StanfordCoreNLP(germanProperties);
        pipeline.annotate(germanAnnotation);
        for (CoreMap sentence : germanAnnotation.get(CoreAnnotations.SentencesAnnotation.class)) {
            Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
            System.out.println(sentenceTree);
        }
    }
}

Make sure you download the full toolkit to use this sample code.

http://stanfordnlp.github.io/CoreNLP/

Also make sure you have there German models jar in your CLASSPATH. The code above will know to look at all the jars in your CLASSPATH and will recognize that file as being in the German jar.

1
votes

First of all: This works, Thank you! But, I don't need this complex way with all these annotators. Thats why I wanted to start with the simple CoreNLP Api. Thats my code:

import edu.stanford.nlp.simple.*;
import java.util.*;

public class Main {

public static void main(String[] args) {

    Sentence sent = new Sentence("Lucy is in the sky with diamonds.");
    List<String> posTags =  sent.posTags();
    List<String> words = sent.words();
    for (int i = 0; i < posTags.size(); i++) {
        System.out.println(words.get(i)+" "+posTags.get(i));
    }
  }
}

How can I get the german prperties file work with this example?

Or the other way: How do I get only the word with the pos tag in your example?

1
votes

The german equivalent to the english example is the following:

LexicalizedParser lp = LexicalizedParser.loadModel("germanPCFG.ser.gz");

Extract the latest stanford-german-corenlp-2018-10-05-models.jar file and you will find it inside the folder: stanford-german-corenlp-2018-10-05-models\edu\stanford\nlp\models\lexparser