2
votes

I am currently utilizing this JAR file for the Stanford NLP models: stanford-corenlp-3.5.2-models.jar

This file is pretty big: its about 340 MB.

I am only using 4 models: tokenize, ssplit, parse, and lemma. Is there any way that I can use a smaller model JAR file (or is there a JAR file for each individual model) because I absolutely need the size of this file to be as small as possible

2

2 Answers

3
votes

You should be fine if you just include the parser's model file in your classpath and the pos tagger's model file. "lemma" requires "pos" , so you will need to include that in your list of annotators.

For instance: "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz" and "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" should be all you need.

You could just create that directory structure and include those files in your classpath, or make a jar with just those files in it. You can definitely cut out most of that jar.

The bottom line is that if you're missing something, your code will crash with a missing resources error. So you simply need to keep adding files until the code stops crashing. You definitely don't need a lot of the files in that jar.

0
votes

Following similar approach as mentioned by @StanfordNLPHelp, I used maven-shade-plugin and reduced the size of my final compiled jar file. You need to change "Package.MainClass" and the includes tag or add excludes tags

<plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.1.0</version>
        <executions>
            <execution>
                <phase>package</phase>
                <goals>
                    <goal>shade</goal>
                </goals>
                <configuration>
                    <transformers>
                        <!-- adding Main-Class to manifest file -->
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                            <mainClass>Package.MainClass</mainClass>
                        </transformer>
                    </transformers>
                    <minimizeJar>true</minimizeJar>
                    <filters>
                        <filter>
                            <artifact>edu.stanford.nlp:stanford-corenlp</artifact>
                            <includes>
                                <include>**</include>
                            </includes>
                        </filter>
                        <filter>
                            <artifact>edu.stanford.nlp:stanford-corenlp:models</artifact>
                            <includes>
                                <include>edu/stanford/nlp/models/pos-tagger/**</include>
                            </includes>
                        </filter>
                    </filters>
                </configuration>
            </execution>
        </executions>
    </plugin>