I've downloaded the CMU ARK Twitter Part-of-Speech tagger to use as part of a larger project. In order to make sure it's working before plugging it in, I ran the script that's in the README of the project from the root of the project on my computer. I made no changes. This is the script:
$ ./runTagger.sh -input example_tweets.txt -output tagged_tweets.txt
I got this error:
java.lang.NoClassDefFoundError: edu/cmu/cs/lti/ark/tweetnlp/RunPOSTagger
Caused by: java.lang.ClassNotFoundException: edu.cmu.cs.lti.ark.tweetnlp.RunPOST aggerat java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source) Could not find the main class: edu.cmu.cs.lti.ark.tweetnlp.RunPOSTagger.
Program will exit. Exception in thread "main"
I believe there's something wrong with the way the classpath is being set. The runTagger script calls another script, classwrap.sh, which is supposed to set the classpath by accessing the root from whence the script gets called, but somehow it's not working.
Here is runTagger.sh
#!/bin/bash
$(dirname $0)/scripts/classwrap.sh -Xmx1g edu.cmu.cs.lti.ark.tweetnlp.RunPOSTagger "$@"
Here is classwrap.sh
#!/bin/bash
# Set up classpath and invoke 'java' with it
set -eu
root=$(dirname $0)/..
cp=""
# Eclipse and IDEA defaults
cp=$cp:$root/bin
cp=$cp:$root/out/production/ark-tweet-nlp
# our build dir
cp=$cp:$root/mybuild
cp=$cp:$(echo $root/lib/*.jar | tr ' ' :)
# Twitter Commons text library stuff
cp=$cp:$(echo $root/lib_twitter/*.jar | tr ' ' :)
exec java -cp "$cp" "$@"
I'm not sure what the problem is. Obviously I'm a n00b when it comes to this, which is why I've come here. Any suggestions would be appreciated.
::EDIT::
I echoed the cp variable before the exec command and this is what was returned:
:scripts/../bin:scripts/../out/production/ark-tweet-nlp:scripts/../mybuild:scripts/../lib/ark-tweet-nlp.jar:scripts/../lib/commons-codec-1.4.jar:scripts/../lib/commons-math-2.1.jar:scripts/../lib/jargs.jar:scripts/../lib/posBerkeley.jar:scripts/../lib/scala-library-2.9.0.1.jar:scripts/../lib_twitter/guava-r09.jar:scripts/../lib_twitter/lucene-core-3.0.3.jar:scripts/../lib_twitter/text-0.1.0.jar:scripts/../lib_twitter/twitter-text-1.1.8.jar
The scripts/../lib/ark-tweet-nlp.jar
contains the compiled version of the code. So I feel like the intention was for that to be how it was included in the classpath. Is that not enough? If so, should I explicitly add lib/edu/cmu...etc to cp?
::EDIT 2::
I emailed Kevin Gimpel, one of the creators of this project, and he sent me a batch file to run instead of the shell scripts that are included with the project.
java -cp lib/ark-tweet-nlp.jar;lib/commons-codec-1.4.jar;lib/commons-math-2.1.jar;lib/jargs.jar;lib/posBerkeley.jar;lib/scala-library-2.9.0.1.jar -Xmx1g edu.cmu.cs.lti.ark.tweetnlp.RunPOSTagger -input example_tweets.txt -output test.txt
As you can see, he set the classpath and then ran the class referencing the entire path from the folder contained in src (edu) down to the class. I've asked him to explain what he thinks the problem was and when he does, I will add that as an answer to this question.