Can Spark streaming and Spark applications be run within the same YARN cluster?

Question

Hello people and happy new year ;) !

I am bulding a lambda architecture with Apache Spark, HDFS and Elastichsearch. In the following picture, here what I am trying to do:

So far, I have written the source code in java for my spark streaming and spark applications. I read in the spark documentation that spark can be run in a Mesos or YARN clutser. As indicated in the picture, I have already a hadoop cluster. Is it possible to run my spark streaming and spark application within the same hadoop cluster ? If yes, is there any particular configuration to do (for instance the number of nodes, RAM...). Or do I have to add a hadoop cluster specialy for spark streaming ?

I hope my explanation is clear.

Yassir

Yes, you can. Please check: inovex.de/blog/247-spark-streaming-on-yarn-in-production — T. Gawęda

franklinsijo franklinsijo · Accepted Answer · 2017-01-13T15:07:58

You need not build a separate cluster for running spark streaming.

Change the spark.master property to yarn-client or yarn-cluster in conf/spark-defaults.conf file. When specified so, the spark application submitted will be handled by the ApplicationMaster of YARN and will be executed by NodeManagers.

Additionally modify these properties of cores and memory to align Spark with Yarn.

In spark-defaults.conf

spark.executors.memory
spark.executors.cores
spark.executors.instances

In yarn-site.xml

yarn.nodemanager.resource.memory-mb
yarn.nodemanager.resource.cpu-vcores

Else it could lead to either deadlock or improper resource utilization of the cluster.

Refer here for resource management of cluster when running Spark on Yarn.

Can Spark streaming and Spark applications be run within the same YARN cluster?

2 Answers