0
votes

I've downloaded the CMU ARK Twitter Part-of-Speech tagger to use as part of a larger project. In order to make sure it's working before plugging it in, I ran the script that's in the README of the project from the root of the project on my computer. I made no changes. This is the script:

$ ./runTagger.sh -input example_tweets.txt -output tagged_tweets.txt

I got this error:

java.lang.NoClassDefFoundError: edu/cmu/cs/lti/ark/tweetnlp/RunPOSTagger
Caused by: java.lang.ClassNotFoundException: edu.cmu.cs.lti.ark.tweetnlp.RunPOST agger

at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source) Could not find the main class: edu.cmu.cs.lti.ark.tweetnlp.RunPOSTagger.
Program will exit. Exception in thread "main"

I believe there's something wrong with the way the classpath is being set. The runTagger script calls another script, classwrap.sh, which is supposed to set the classpath by accessing the root from whence the script gets called, but somehow it's not working.

Here is runTagger.sh

#!/bin/bash

$(dirname $0)/scripts/classwrap.sh -Xmx1g edu.cmu.cs.lti.ark.tweetnlp.RunPOSTagger "$@"

Here is classwrap.sh

#!/bin/bash

# Set up classpath and invoke 'java' with it

set -eu
root=$(dirname $0)/..

cp=""
# Eclipse and IDEA defaults
cp=$cp:$root/bin
cp=$cp:$root/out/production/ark-tweet-nlp
# our build dir
cp=$cp:$root/mybuild

cp=$cp:$(echo $root/lib/*.jar | tr ' ' :)
# Twitter Commons text library stuff
cp=$cp:$(echo $root/lib_twitter/*.jar | tr ' ' :)

exec java -cp "$cp" "$@"

I'm not sure what the problem is. Obviously I'm a n00b when it comes to this, which is why I've come here. Any suggestions would be appreciated.

::EDIT::
I echoed the cp variable before the exec command and this is what was returned:

:scripts/../bin:scripts/../out/production/ark-tweet-nlp:scripts/../mybuild:scripts/../lib/ark-tweet-nlp.jar:scripts/../lib/commons-codec-1.4.jar:scripts/../lib/commons-math-2.1.jar:scripts/../lib/jargs.jar:scripts/../lib/posBerkeley.jar:scripts/../lib/scala-library-2.9.0.1.jar:scripts/../lib_twitter/guava-r09.jar:scripts/../lib_twitter/lucene-core-3.0.3.jar:scripts/../lib_twitter/text-0.1.0.jar:scripts/../lib_twitter/twitter-text-1.1.8.jar

The scripts/../lib/ark-tweet-nlp.jar contains the compiled version of the code. So I feel like the intention was for that to be how it was included in the classpath. Is that not enough? If so, should I explicitly add lib/edu/cmu...etc to cp?

::EDIT 2::
I emailed Kevin Gimpel, one of the creators of this project, and he sent me a batch file to run instead of the shell scripts that are included with the project.

java -cp lib/ark-tweet-nlp.jar;lib/commons-codec-1.4.jar;lib/commons-math-2.1.jar;lib/jargs.jar;lib/posBerkeley.jar;lib/scala-library-2.9.0.1.jar -Xmx1g edu.cmu.cs.lti.ark.tweetnlp.RunPOSTagger -input example_tweets.txt -output test.txt

As you can see, he set the classpath and then ran the class referencing the entire path from the folder contained in src (edu) down to the class. I've asked him to explain what he thinks the problem was and when he does, I will add that as an answer to this question.

3

3 Answers

1
votes

those 2 commands indicate a mismatch between your OS and the OS the script was made for. ";" is the classpath separator on windows, while ":" is the classpath separator everywhere else (e.g. linux).

1
votes

Problem in classpath, it doesn't contain edu/cmu/cs/lti/ark/tweetnlp/RunPOSTagger class, try to print cp variable in this script, maybe you missed something.

Use wildcard for "all jars" in directory e.g.: cp=$cp:/your/cp/dir/*.jar

Try to change this line as follows: cp=$cp:$root/out/production/ark-tweet-nlp/*.jar

0
votes

You can call arK-tweet POS tagger from java program and can obtain the output.

{
 // Run as a separate system process
 String inputFile = ".\\input.txt";  //contains the input text
 Process proc = Runtime.getRuntime().exec("java -jar .\\ark-tweet-nlp-0.3.2.jar --output-format conll --no-confidence "+inputFile);
    // Then retreive the process output
    BufferedReader reader = new BufferedReader(new InputStreamReader(proc.getInputStream()));
    String input;
    while((input = reader.readLine())!= null)
    {
        System.out.println(input);
        //you can also write the input string to a file if you want
    }
    reader.close();
 }

NOTE: In the java command used within exec(), make sure you give the correct path to ark-tweet-nlp.jar