Druid + Hadoop (for both uses, deep-store & indexing)

Question

If I have Hadoop server (pseudo-distributed mode) running on a separate machine, do I still need to have these files under my Druid's conf dir ? : http://druid.io/docs/latest/configuration/hadoop.html

The way I see it:

Looks like those -site.xml files are for Hadoop server..., and Druid only acts as Hadoop client. So I don't think Druid needs the hdfs-site.xml.

Core-site.xml..., ok, I can get it. I mean, Druid nees to know the IP of the name node (hadoop).

Mapred-site.xml, partially. Druid needs to know the status of mapreduce jobs (I suppose it will delegate the indexing to Hadoop as MR job). So it needs to communicate with those job trackers to see if the indexing is finished / failed / in progress. For that, it needs the URL of Hadoop JT.

However Druid does not need this prperty "mapreduce.cluster.local.dir", because it does not participate actively in MR job.

Yarn-site.xml? Maybe it should stay, partially. At least for submitting a job (?).

What about HDFS-site.xml? I think this can be scrapped completely.

Capacity-scheduler.xml? It can go.

Please correct me If I'm wrong.

These questions / doubts arises because I'm quite new to hadoop. I have my hadoop setup running. Pseudo distributed mode. I also tested it with javascript webhdfs library to write and read file. Also have tried the sample MR jobs provided by the hadoop dist. So I guess my hadoop setup is fine. I'm just a bit unsure on the Druid site, partly because the doc is not ver clear about it.

Btw.... I have hadoop 2.7.2... While the hadoop-client libs used by Druid is still on 2.3.0.

Should I downgrade my hadoop server to 2.3.0?

http://druid.io/docs/latest/operations/other-hadoop.html

Thansk, Raka

Slim Bouguerra Slim Bouguerra · Accepted Answer · 2016-12-10T16:21:05

Please add the mapred-site.xml core-site.xml hdfs-site.xml yarn-site.xml to the classpath. Also you don't need to downgrade druid works well with 2.7.X. As you can see in the doc you can use multiple version of hadoop.

Druid + Hadoop (for both uses, deep-store & indexing)

1 Answers