I'm running hadoop Cloudera CHD4.5 on a VM. From Manhout in Action chapters 9-10, I try to create a custom lucene analyzer. My analyzer is defined in its own class. When I create the JAR the class is present. When I execute via command line, I keep getting java.lang.IllegalStateException: java.lang.ClassNotFoundException: my.org.MyAnalyzer.
HADOOP_CLASSPATH works to pass missing classes to client but my problem is that it seems that myanalyzer class is not passed to the map/reduce JVM. I have not created my own map/reduce classes and do not run a Job from my main class. I'm using the existing mahout classes to achieve my goal. In my code if I use WhitespaceAnalyzer for example instead of my analyzer, I do not have problem.
This is my analyzer class
public final class MyAnalyzer extends Analyzer {
public final StandardAnalyzer stdAnalyzer = new StandardAnalyzer(Version.LUCENE_36);
@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
return stdAnalyzer.tokenStream(fieldName, reader);
}
}
Any ideas?
I tried maven and I have the same result. However, I might not have the right dependency in the pom.xml file.
-libjars is not working as I'm not using GenericOptionsParser method in my code.