4
votes

I'm using Spark 1.2.1 on DataStax Enterprise 4.7 (DSE) as stand alone cluster of 3 nodes (AWS vpc servers). When launching application to it from the master node, it passes the first stage but got an "remote Akka client disassociated" error on the second stage. Also I got "Asked to remove non-existent executor 0" errors.

  • have no YARN.

  • tried to set the AKKA timeout to 6000, nothing changed.

  • all ports are set and the cluster seems to be OK by the Spark web UI.

Could it be a timeout issue?

ERROR 2015-07-09 12:59:24 org.apache.spark.scheduler.TaskSchedulerImpl: Lost executor 1 on 1xx.xx.xx.x1: remote Akka client disassociated WARN 2015-07-09 12:59:24 org.apache.spark.scheduler.TaskSetManager: Lost task 6.0 in stage 1.0 (TID 19, 1xx.xx.x.x1): ExecutorLostFailure (executor 1 lost) WARN 2015-07-09 12:59:24 akka.remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:38145] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. ERROR 2015-07-09 12:59:24 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 1 ERROR 2015-07-09 12:59:24 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 1 [Stage 1:=====================================================> (5 + 0) / 12]ERROR 2015-07-09 12:59:32 org.apache.spark.scheduler.TaskSchedulerImpl: Lost executor 2 on 1xx.xx.xx.x2: remote Akka client disassociated WARN 2015-07-09 12:59:32 akka.remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:33914] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. WARN 2015-07-09 12:59:32 org.apache.spark.scheduler.TaskSetManager: Lost task 0.1 in stage 1.0 (TID 20, 1xx.xx.xx.x2): ExecutorLostFailure (executor 2 lost) ERROR 2015-07-09 12:59:32 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 2 ERROR 2015-07-09 12:59:32 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 2 [Stage 1:====================================================================================> (8 + -2) / 12]ERROR 2015-07-09 13:01:03 org.apache.spark.scheduler.TaskSchedulerImpl: Lost executor 3 on 1xx.xx.xx.x3: remote Akka client disassociated WARN 2015-07-09 13:01:03 akka.remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:58630] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. WARN 2015-07-09 13:01:03 org.apache.spark.scheduler.TaskSetManager: Lost task 1.1 in stage 1.0 (TID 23, 1xx.xx.xx.x3): ExecutorLostFailure (executor 3 lost) ERROR 2015-07-09 13:01:03 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 3 ERROR 2015-07-09 13:01:03 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 3 [Stage 1:====================================================================================> (8 + -3) / 12

1
What's in the worker logs?dpeacock
@dpeacock - you can see it in my main question... Its a stand alone cluster so the logs come out in the console. And there is no error logs in the workers.Reshef
Does this work from spark shell? dse sparkphact
@phact - I run it through /usr/bin/dse spark-submit.Reshef

1 Answers

1
votes

I tried to change AKKA settings, ports etc` but in the end the solution was to start over in new & clean AWS environment- 3 new servers with re-installation of the DSE system.

:/