2
votes

I have configured eclipse for scala and created a maven project and wrote a simple word count spark job on windows. Now my spark+hadoop are installed on linux server. How can I launch my spark code from eclipse to spark cluster (which is on linux)?

Any suggestion.

4
Suggestion, use IntelliJIdea, personally I think it is the best IDE for scala and javaAlberto Bonsanto
Yeah..but my question is how do I run my code on cluster . Lets say if use intellijide then how can I do it in that?Shashi
Where is your master? Do you use Mesos, Yarn, or other?Alberto Bonsanto

4 Answers

2
votes

Actually this answer is not so simple, as you would expect.

I will make many assumptions, first that you use sbt, second is that you are working in a linux based computer, third is the last is that you have two classes in your project, let's say RunMe and Globals, and the last assumption will be that you want to set up the settings inside the program. Thus, somewhere in your runnable code you must have something like this:

object RunMe {
  def main(args: Array[String]) {
    val conf = new SparkConf()
      .setMaster("mesos://master:5050") //If you use Mesos, and if your network resolves the hostname master to its IP.
      .setAppName("my-app")
      .set("spark.executor.memory", "10g")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext()

    //your code comes here
  }
}

The steps you must follow are:

  • Compile the project, in the root of it, by using:

    $ sbt assembly

  • Send the job to the master node, this is the interesting part (assuming you have the next structure in your project target/scala/, and inside you have a file .jar, which corresponds to the compiled project)

    $ spark-submit --class RunMe target/scala/app.jar

Notice that, because I assumed that the project has two or more classes you would have to identify which class you want to run. Furthermore, I bet that both approaches, for Yarn and Mesos are very similar.

0
votes

If you are developing a project in Windows and you want to deploy it in Linux environment then you would want to create an executable JAR file and export it to the home directory of your Linux and specify the same in your spark script (on your terminal). This is possible all because of the beauty of Java Virtual Machine. Let me know if you need more help.

0
votes

To achieve what you want, you would need:

First: Build the jar (if you use gradle -> fatJar or shadowJar)

Second: In your code, when you generate the SparkConf, you need to specify Master address, spark.driver.host and relative Jar location, smth like:

SparkConf conf = new SparkConf()
.setMaster("spark://SPARK-MASTER-ADDRESS:7077")
.set("spark.driver.host", "IP Adress of your local machine")
.setJars(new String[]{"path\\to\\your\\jar file.jar"})
.setAppName("APP-NAME");

And third: Just Right Click and run from your IDE. That's it... !

-2
votes

What you are looking for is the master where the SparkContext should be created.

You need to set your master to be the cluster you want to use.

I invite you to read the Spark Programming Guide or follow an introductory course to understand these basic concepts. Spark is not a tool you can begin work with overnight, it takes some time.

http://spark.apache.org/docs/latest/programming-guide.html#initializing-spark