0
votes

I want to run canopy example in mahout, but I am getting error:

Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /usr/local/mahout-distribution-0.7/examples/target/mahout-examples-0.7-> job.jar Warning: $HADOOP_HOME is deprecated.

Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/io/Closeables at org.apache.mahout.driver.MahoutDriver.loadProperties(MahoutDriver.java:214) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:98) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.lang.ClassNotFoundException: com.google.common.io.Closeables at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 7 more

in case, there is guava-r09.jar in "/usr/local/mahout-distribution-0.7/examples/target/dependency" that including com/google/common/io/Closeables class and its dependency is in pom.xml.

what can I do to solve this error?

1

1 Answers

0
votes

Unfortunately, Hadoop and Mahout is a mess. You are probably in jar hell.

I don't think it is actually worth to try a lot. Unless you have Google scale data, a non-Mapreduce implementation will likely be much faster. That is: as long as your data fits into memory of a single node, prefer a single-node solution.

In my experiments, a Mahout cluster with 10-15 CPUs would be about 5x slower than a good single CPU implementation. Because all this "framework" and the disk-oriented operation of Hadoop and Mahout come at a substantial cost. So, if your data isn't on the size of terabytes, don't use it.

Now to a more precise answer:

You are in a distributed world. It is not sufficient if this .jar file is avilable on your computer. Due to the design of Hadoop, it must actually be uploaded to every host that participates in your computation. The mahout command is supposed to take care of this (identifying and uploading all required .jar files), but sometimes it fails to get the classpath right. And unfortunately, the process is a mess and hard to understand or debug.

Warning: $HADOOP_HOME is deprecated.

Take this warning seriously. What is the current way of getting the classpath right?