Spark sql is working, but it seems that without any cluster manager, it is possible?

Question

I have a hadoop cluster with 4 nodes(1master, 3slaves). And I create some hive tables from files stored in hdfs. Then I configure mysql as the hive metastore and copy the hive-site.xml file inside conf folder of spark.

To install spark, I just download and extract spark in the master node. And after copy the hive-site.xml inside spark conf folder, I start spark with spark-shell command. Its needed to install in slave nodes also?

Im asking this because, Im executing with success spark sql queries like below, but if I try to acess the cluster manager default page in localhost:8080, it shows "Unable to connect". So it seems that spark sql is working fine, but without any cluster manager working, this is possible??

var hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
query = hiveContext.sql("select * from customers");
query.show()

master:8080

what is the exact command spark-shellthat you have started? have you set the master with this command? — user1314742
I just download and extract the spark on the namenode. I oopy the hive-site.xml file into spark "conf" folder and did not any other configuration. To start spark, I just execute this command "spark-shell". — codin
Please see my answer below.. I explained how to start spark with hadoop — user1314742

user1314742 user1314742 · Accepted Answer · 2016-05-09T16:23:00

First , you have to let spark knows where your Hadoop configurations are, by setting the env variable HADOOP_CONF_DIR in your spark-env.sh file

Then, when starting the spark-shell you have to tell spark to use yarn as the master: spark-shell --master yarn-client

for more information you could see the spark with yarn docs

Spark sql is working, but it seems that without any cluster manager, it is possible?

2 Answers