2
votes

I would like to read data from hadoop, process on spark, and wirte result on hadoop and elastic search. I have few worker nodes to do this.

Spark standalone cluster is sufficient? or Do I need to make hadoop cluster to use yarn or mesos?

If standalone cluster mode is sufficient, should jar file be set on all node unlike yarn, mesos mode?

1

1 Answers

3
votes

First of all, you can not write data in Hadoop or read data from Hadoop. It is HDFS (Component of Hadoop ecosystem) which is responsible for read/write of data. Now coming to your question

  1. Yes, it possible to read data from HDFS and process it in spark engine and then write the output on HDFS.

  2. YARN, mesos and spark standalone all are cluster managers and you can use any one of them to do management of resources in your cluster and it had nothing to do with hadoop. But since you want to read and write data from/to HDFS then you need to install HDFS on cluster and thus it is better to install hadoop on your all nodes that will also install HDFS on all nodes. Now whether you want to use YARN, mesos or spark standalone that is your choice all will work with HDFS I myself use spark standalone for cluster management.

  3. It is not clear about which jar files you are talking to but I assume it will be of spark then yes you need to set the path for spark jar on each node so that there will be no contradiction in paths when spark run's.