3
votes

I have been able to successfully run a Spark application in IntelliJ Idea when setting master as local[*]. However, when I set master to a separate instance of Spark an exception occurs.

The SparkPi App I am trying to execute is below.

import scala.math.random

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

/** Computes an approximation to pi */
object SparkPi {
  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("spark tjvrlaptop:7077").setAppName("Spark Pi") //.set("spark.scheduler.mode", "FIFO").set("spark.cores.max", "8")
    val spark = new SparkContext(conf)
    val slices = if (args.length > 0) args(0).toInt else 20
    val n = math.max(100000000L * slices, Int.MaxValue).toInt // avoid overflow

    for(j <- 1 to 1000000) {
      val count = spark.parallelize(1 until n, slices).map { i =>
        val x = random * 2 - 1
        val y = random * 2 - 1
        if (x * x + y * y < 1) 1 else 0
      }.reduce(_ + _)
      println("Pi is roughly " + 4.0 * count / n)
    }
    spark.stop()
  }
}

Here is my build.sbt contents:

name := "SBTScalaSparkPi"

version := "1.0"

scalaVersion := "2.10.6"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"

Here is my plugins.sbt contents:

logLevel := Level.Warn

I executed the Spark Master and a worker by using the following commands in different command prompts on the same machine.

spark-1.6.1-bin-hadoop2.6\bin>spark-class org.apache.spark.deploy.master.Master --host tjvrlaptop

spark-1.6.1-bin-hadoop2.6\bin>spark-class org.apache.spark.deploy.worker.Worker spark tjvrlaptop:7077

[The Master and Worker seems to be up and running without any issues][1]

[1]: http i.stack.imgur.com/B3BDZ.png

Next I tried to run the program in IntelliJ. It fails after a while with the following errors:

Command Promt where Master is running

16/03/27 14:44:33 INFO Master: Registering app Spark Pi
16/03/27 14:44:33 INFO Master: Registered app Spark Pi with ID app-20160327144433-0000
16/03/27 14:44:33 INFO Master: Launching executor app-20160327144433-0000/0 on worker

worker-20160327140440-192.168.56.1-52701 16/03/27 14:44:38 INFO Master: Received unregister request from application app-20160327144433-0000 16/03/27 14:44:38 INFO Master: Removing app app-20160327144433-0000 16/03/27 14:44:38 INFO Master: TJVRLAPTOP:55368 got disassociated, removing it. 16/03/27 14:44:38 INFO Master: 192.168.56.1:55350 got disassociated, removing it. 16/03/27 14:44:38 WARN Master: Got status update for unknown executor app-20160327144433-0000/0

Command Prompt where the Worker is running

16/03/27 14:44:34 INFO Worker: Asked to launch executor app-20160327144433-0000/0 for Spark Pi 16/03/27 14:44:34 INFO SecurityManager: Changing view acls to: tjoha 16/03/27 14:44:34 INFO SecurityManager: Changing modify acls to: tjoha 16/03/27 14:44:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tjoha); users with modify permissions: Set(tjoha) 16/03/27 14:44:34 INFO ExecutorRunner: Launch command: "C Program Files\Java\jre1.8.0_77\bin\java" "-cp" "C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin..\conf\;C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin..\lib\spark-assembly-1.6.1-hadoop2.6.0.jar;C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin..\lib\datanucleus-api-jdo-3.2.6.jar;C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin..\lib\datanucleus-core-3.2.10.jar;C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin..\lib\datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=55350" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark [email protected]:55350" "--executor-id" "0" "--hostname" "192.168.56.1" "--cores" "8" "--app-id" "app-20160327144433-0000" "--worker-url" "spark [email protected]:52701" 16/03/27 14:44:38 INFO Worker: Asked to kill executor app-20160327144433-0000/0 16/03/27 14:44:38 INFO ExecutorRunner: Runner thread for executor app-20160327144433-0000/0 interrupted 16/03/27 14:44:38 INFO ExecutorRunner: Killing process! 16/03/27 14:44:38 INFO Worker: Executor app-20160327144433-0000/0 finished with state KILLED exitStatus 1 16/03/27 14:44:38 INFO Worker: Cleaning up local directories for application app-20160327144433-0000 16/03/27 14:44:38 INFO ExternalShuffleBlockResolver: Application app-20160327144433-0000 removed, cleanupLocalDirs = true

IntelliJ Idea Output

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/03/27 15:06:04 INFO SparkContext: Running Spark version 1.6.1 16/03/27 15:06:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/03/27 15:06:05 INFO SecurityManager: Changing view acls to: tjoha 16/03/27 15:06:05 INFO SecurityManager: Changing modify acls to: tjoha 16/03/27 15:06:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tjoha); users with modify permissions: Set(tjoha) 16/03/27 15:06:06 INFO Utils: Successfully started service 'sparkDriver' on port 56183. 16/03/27 15:06:07 INFO Slf4jLogger: Slf4jLogger started 16/03/27 15:06:07 INFO Remoting: Starting remoting 16/03/27 15:06:07 INFO Remoting: Remoting started; listening on addresses :[akka tcp [email protected]:56196] 16/03/27 15:06:07 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 56196. 16/03/27 15:06:07 INFO SparkEnv: Registering MapOutputTracker 16/03/27 15:06:07 INFO SparkEnv: Registering BlockManagerMaster 16/03/27 15:06:07 INFO DiskBlockManager: Created local directory at C Users\tjoha\AppData\Local\Temp\blockmgr-9623b0f9-81f5-4a10-bbc7-ba077d53a2e5 16/03/27 15:06:07 INFO MemoryStore: MemoryStore started with capacity 2.4 GB 16/03/27 15:06:07 INFO SparkEnv: Registering OutputCommitCoordinator 16/03/27 15:06:07 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 16/03/27 15:06:07 INFO Utils: Successfully started service 'SparkUI' on port 4041. 16/03/27 15:06:07 INFO SparkUI: Started SparkUI at http 192.168.56.1:4041 16/03/27 15:06:08 INFO AppClient$ClientEndpoint: Connecting to master spark tjvrlaptop:7077... 16/03/27 15:06:09 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160327150608-0002 16/03/27 15:06:09 INFO AppClient$ClientEndpoint: Executor added: app-20160327150608-0002/0 on worker-20160327150550-192.168.56.1-56057 (192.168.56.1:56057) with 8 cores 16/03/27 15:06:09 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160327150608-0002/0 on hostPort 192.168.56.1:56057 with 8 cores, 1024.0 MB RAM 16/03/27 15:06:09 INFO AppClient$ClientEndpoint: Executor updated: app-20160327150608-0002/0 is now RUNNING 16/03/27 15:06:09 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56234. 16/03/27 15:06:09 INFO NettyBlockTransferService: Server created on 56234 16/03/27 15:06:09 INFO BlockManagerMaster: Trying to register BlockManager 16/03/27 15:06:09 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.56.1:56234 with 2.4 GB RAM, BlockManagerId(driver, 192.168.56.1, 56234) 16/03/27 15:06:09 INFO BlockManagerMaster: Registered BlockManager 16/03/27 15:06:09 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 16/03/27 15:06:10 INFO SparkContext: Starting job: reduce at SparkPi.scala:37 16/03/27 15:06:10 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:37) with 20 output partitions 16/03/27 15:06:10 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:37) 16/03/27 15:06:10 INFO DAGScheduler: Parents of final stage: List() 16/03/27 15:06:10 INFO DAGScheduler: Missing parents: List() 16/03/27 15:06:10 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:33), which has no missing parents 16/03/27 15:06:10 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1880.0 B, free 1880.0 B) 16/03/27 15:06:10 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1212.0 B, free 3.0 KB) 16/03/27 15:06:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.56.1:56234 (size: 1212.0 B, free: 2.4 GB) 16/03/27 15:06:10 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006 16/03/27 15:06:10 INFO DAGScheduler: Submitting 20 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:33) 16/03/27 15:06:10 INFO TaskSchedulerImpl: Adding task set 0.0 with 20 tasks 16/03/27 15:06:14 INFO SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (TJVRLAPTOP:56281) with ID 0 16/03/27 15:06:14 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, TJVRLAPTOP, partition 0,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:14

...

TJVRLAPTOP, partition 6,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:14 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, TJVRLAPTOP, partition 7,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:14 INFO BlockManagerMasterEndpoint: Registering block manager TJVRLAPTOP:56319 with 511.1 MB RAM, BlockManagerId(0, TJVRLAPTOP, 56319) 16/03/27 15:06:15 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on TJVRLAPTOP:56319 (size: 1212.0 B, free: 511.1 MB) 16/03/27 15:06:15 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, TJVRLAPTOP, partition 8,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:15 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, TJVRLAPTOP, partition 9,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:15 INFO TaskSetManager: Starting task 10.0 in stage 0.0 (TID 10, TJVRLAPTOP, partition 10,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:15 INFO TaskSetManager: Starting task 11.0 in stage 0.0 (TID 11, TJVRLAPTOP, partition 11,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:15

...

java.lang.ClassNotFoundException: SparkPi$$anonfun$main$1$$anonfun$1 at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68) at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source) at java.io.ObjectInputStream.readClassDesc(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.defaultReadFields(Unknown Source) at java.io.ObjectInputStream.readSerialData(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.defaultReadFields(Unknown Source) at java.io.ObjectInputStream.readSerialData(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.defaultReadFields(Unknown Source) at java.io.ObjectInputStream.readSerialData(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) 16/03/27 15:06:15 INFO TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5) on executor TJVRLAPTOP: java.lang.ClassNotFoundException (SparkPi$$anonfun$main$1$$anonfun$1) [duplicate 1] 16/03/27 15:06:15 INFO TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3) on executor TJVRLAPTOP: java.lang.ClassNotFoundException

...

INFO TaskSetManager: Starting task 10.1 in stage 0.0 (TID 20, TJVRLAPTOP, partition 10,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:15

...

TJVRLAPTOP, partition 3,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:15 INFO TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) on executor TJVRLAPTOP: java.lang.ClassNotFoundException (SparkPi$$anonfun$main$1$$anonfun$1) [duplicate 8] 16/03/27 15:06:15 INFO TaskSetManager: Lost task 12.0 in stage 0.0 (TID 12) on executor TJVRLAPTOP: java.lang.ClassNotFoundException

...

INFO TaskSetManager: Starting task 2.3 in stage 0.0 (TID 39, TJVRLAPTOP, partition 2,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:16 MapOutputTrackerMasterEndpoint stopped! 16/03/27 15:06:16 WARN TransportChannelHandler: Exception in connection from TJVRLAPTOP/192.168.56.1:56281 java.io.IOException: An existing connection was forcibly closed by the remote host 16/03/27 15:06:17 INFO MemoryStore: MemoryStore cleared 16/03/27 15:06:17 INFO BlockManager: BlockManager stopped 16/03/27 15:06:17 INFO BlockManagerMaster: BlockManagerMaster stopped 16/03/27 15:06:17 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/03/27 15:06:17 INFO SparkContext: Successfully stopped SparkContext 16/03/27 15:06:17 INFO ShutdownHookManager: Shutdown hook called 16/03/27 15:06:17 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 16/03/27 15:06:17 INFO ShutdownHookManager: Deleting directory C Users\tjoha\AppData\Local\Temp\spark-11f8184f-23fb-43be-91bb-113fb74aa8b9

1

1 Answers

1
votes

When you are running in embedded mode(local[*]), Spark has all required code on classpath.

When you are running in standalone mode you will have to make it available explicitly to Spark by copying the jar to lib folder.