0
votes

I have a hadoop cluster with 4 nodes(1master, 3slaves). And I create some hive tables from files stored in hdfs. Then I configure mysql as the hive metastore and copy the hive-site.xml file inside conf folder of spark.

To install spark, I just download and extract spark in the master node. And after copy the hive-site.xml inside spark conf folder, I start spark with spark-shell command. Its needed to install in slave nodes also?

Im asking this because, Im executing with success spark sql queries like below, but if I try to acess the cluster manager default page in localhost:8080, it shows "Unable to connect". So it seems that spark sql is working fine, but without any cluster manager working, this is possible??

var hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
query = hiveContext.sql("select * from customers");
query.show()

master:8080

enter image description here

2
what is the exact command spark-shellthat you have started? have you set the master with this command?user1314742
I just download and extract the spark on the namenode. I oopy the hive-site.xml file into spark "conf" folder and did not any other configuration. To start spark, I just execute this command "spark-shell".codin
Please see my answer below.. I explained how to start spark with hadoopuser1314742

2 Answers

1
votes

First , you have to let spark knows where your Hadoop configurations are, by setting the env variable HADOOP_CONF_DIR in your spark-env.sh file

Then, when starting the spark-shell you have to tell spark to use yarn as the master: spark-shell --master yarn-client

for more information you could see the spark with yarn docs

1
votes
  1. You need to start the cluster separately; by default spark-shell runs locally.

  2. You'll need to have spark binaries on the worker nodes as well.

For documentation on starting your own spark cluster, see here: https://spark.apache.org/docs/latest/spark-standalone.html