SBT : Running Spark job on remote cluster from sbt

Question

I have a spark-job (lets call it wordcount) written in Scala, which I am able to run in following manners

Run on a local spark instance from within sbt

sbt> runMain WordCount [InputFile] [Otuputdir] local[*]
Run on a remote spark cluster spark-submit the jar

sbt> package

$> spark-submit --master spark://192.168.1.1:7077 --class WordCount target/scala-2.10/wordcount_2.10-1.5.0-SNAPSHOT.jar [InputFile] [Otuputdir]

Code :

// get arguments
val inputFile = args(0)
val outputDir = args(1)
// if 3rd argument defined then use it
val conf = if ( args.length == 3 )  new SparkConf().setAppName("WordCount").setMaster(args(2)) else  new SparkConf().setAppName("WordCount") 
val sc = new SparkContext(conf)

How can I run this job on remote spark cluster from SBT ?

Alex Naspo Alex Naspo · Accepted Answer · 2016-10-16T19:35:04

3

votes

There is a sbt plugin for spark-submit. https://github.com/saurfang/sbt-spark-submit

SBT : Running Spark job on remote cluster from sbt

1 Answers