I'm using Spark 1.2.1 on DataStax Enterprise 4.7 (DSE) as stand alone cluster of 3 nodes (AWS vpc servers). When launching application to it from the master node, it passes the first stage but got an "remote Akka client disassociated" error on the second stage. Also I got "Asked to remove non-existent executor 0" errors.
have no YARN.
tried to set the AKKA timeout to 6000, nothing changed.
all ports are set and the cluster seems to be OK by the Spark web UI.
Could it be a timeout issue?
ERROR 2015-07-09 12:59:24 org.apache.spark.scheduler.TaskSchedulerImpl: Lost executor 1 on 1xx.xx.xx.x1: remote Akka client disassociated WARN 2015-07-09 12:59:24 org.apache.spark.scheduler.TaskSetManager: Lost task 6.0 in stage 1.0 (TID 19, 1xx.xx.x.x1): ExecutorLostFailure (executor 1 lost) WARN 2015-07-09 12:59:24 akka.remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:38145] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. ERROR 2015-07-09 12:59:24 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 1 ERROR 2015-07-09 12:59:24 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 1 [Stage 1:=====================================================> (5 + 0) / 12]ERROR 2015-07-09 12:59:32 org.apache.spark.scheduler.TaskSchedulerImpl: Lost executor 2 on 1xx.xx.xx.x2: remote Akka client disassociated WARN 2015-07-09 12:59:32 akka.remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:33914] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. WARN 2015-07-09 12:59:32 org.apache.spark.scheduler.TaskSetManager: Lost task 0.1 in stage 1.0 (TID 20, 1xx.xx.xx.x2): ExecutorLostFailure (executor 2 lost) ERROR 2015-07-09 12:59:32 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 2 ERROR 2015-07-09 12:59:32 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 2 [Stage 1:====================================================================================> (8 + -2) / 12]ERROR 2015-07-09 13:01:03 org.apache.spark.scheduler.TaskSchedulerImpl: Lost executor 3 on 1xx.xx.xx.x3: remote Akka client disassociated WARN 2015-07-09 13:01:03 akka.remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:58630] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. WARN 2015-07-09 13:01:03 org.apache.spark.scheduler.TaskSetManager: Lost task 1.1 in stage 1.0 (TID 23, 1xx.xx.xx.x3): ExecutorLostFailure (executor 3 lost) ERROR 2015-07-09 13:01:03 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 3 ERROR 2015-07-09 13:01:03 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 3 [Stage 1:====================================================================================> (8 + -3) / 12
dse spark
– phact