Druid / Hadoop batch index / Map Reduce / YARN / No remote, just local

Question

Resolved

Turns out we need to put validation.jar in hadoop/share/hadoop/common/lib/ ( download it from https://mvnrepository.com/artifact/javax.validation/validation-api *).

Combine that with what the doc says: set "mapreduce.job.classloader" to "true" in your Druid's indexing task json.

And you'll get it working :) -- Druid 0.9.2 with Hadoop 2.7.3

*) Not sure why, I could see that Druid uploaded all the jars in its classpath to Hadoop (and validation.jar is in there). Maybe there is a restriction on how JVM loads javax.* library from custom classloader (?)

What follows below is for historical purpose, to help searches.

UPDATE UPDATE

My bad. I forgot to copy the core-site.xml etc. in my Dockerfile, to the correct place in Druid instalation.

I fixed that, now it sends the job to hadoop.

However, now I'm running into another problem. Failure in the execution of the job. java.lang.reflect.InvocationTargetException, at at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2].

Similar to the one reported here: https://groups.google.com/forum/#!topic/druid-development/_JXvLbykD0E . But that one at least has more hints in the stacktrace (permission). My case not so clear. Anyone having the same problem?

!!!UPDATE AGAIN!!!

I think this is the case I'm having. The same: https://groups.google.com/forum/#!topic/druid-user/4yDRoQZn8h8

And I confirmed it by checking the logs of MR through Hadoop's timeline server:

Let me try fixing it and update this post afterward.

Update: found this: https://groups.google.com/forum/#!topic/druid-user/U6zMkhm3WiU

Update: Nope. setting "mapreduce.job.classloader": "true" is giving me another problem on the map task: java.lang.ClassNotFoundException: javax.validation.Validator at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424).... This whole class-loading thing :(

So, the culprit is guice library. Druid 0.9.2 uses Guice 4.1.0, while Hadoop 2.7.3 stucks with Guice 3.0.0..., and the mapreduce.job.classloader is not working (it gives yet another java class not found problem).

What to do now? Copying guice 4.1.0 from Druid to Hadoop?

Original Post

Why Druid (0.9.2) is not submitting the job to resource manager (and have the job ran in the hadoop cluster)? Can someone point out what detail am I missing, please?

I have Hadoop cluster (pseudo) running version 2.7.2, on a machine whose host name set to 'hadoop'. That hadoop and and my druid run on separate docker instances. The druid docker has --link to the hadoop instance.

From the log I can tell that it performs the MR locally (using LocalJobRunner).

I can also confirm the indexing succeeded, from the log, and by checking HDFS:

Also, from the YARN UI... I'm not seeing any job being submitted.

I've configured everything according to the documentation. In core-site.xml of my Druid, I have:

<property>
  <name>fs.default.name</name>
  <value>hdfs://hadoop:9000</value>
</property>

(Yes, it's fs.default.name, instead of fs.defaultFS... because Druid extension still uses 2.3.0, and defaultFS is not known until 2.4.x). Sidestep a little: I think there's a bug with classpath in Druid, it's not adding the hadoop-dependencies to the list of classpath for running worker (I've already specified default coordinates in the common's runtime properties).

Ok, also, in overlord runtime.properties I've specified indexing runner type to remote. The same in middleManager runtime.properties. I could see those config picked up by Druid.

Also, the indexing log storage type, set to HDFS, and I can confirm the file get stored in HDFS.

So, as far as deep-storage concerned, all is fine. It's just this Map-Reduce. Not running in cluster. Somebody also stumbled upon the same problem, no resolution from the thread. Here: https://groups.google.com/forum/#!topic/druid-user/vvX3VEGMTcw

I can confirm that deep-storage has not issue (the input file pulled from HDFS path I specified, and segments also stored in the HDFS).

What am I missing?

Cokorda Raka Cokorda Raka · Accepted Answer · 2016-12-12T18:11:21