Unable to open native connection with spark sometimes

Question

I'm running a Spark job with Spark version 1.4 and Cassandra 2.18. I telnet from master and it works to cassandra machine. Sometimes the job runs fine and sometimes I get the following exception. Why would this happen only sometimes?

"Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, 172.28.0.162): java.io.IOException: Failed to open native connection to Cassandra at {172.28.0.164}:9042 at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:155) "

It sometimes also gives me this exception along with the upper one:

Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /172.28.0.164:9042 (com.datastax.driver.core.TransportException: [/172.28.0.164:9042] Connection has been closed))

Yep, I do have. The problem is I get it sometimes and sometimes my code runs fines. When I restart all my master and slave it works and after runnings my job 2-3 times it again gives me this error. I closed all the TIME_WAIT ports but still see this issue — Nipun

Code Herder Code Herder · Accepted Answer · 2015-08-13T17:56:47

I had the second error "NoHostAvailableException" happen to me quite a few times this week as I was porting Python spark to Java Spark.

I was having issues with the driver thread being nearly out of memory and the GC was taking up all my cores (98% of all 8 core), pausing the JVM all the time.

In python when this happens it's much more obvious (to me) so it took me a bit of time to realize what was going on, so I got this error quite a few times.

I had two theory on the root cause, but the solution was not having the GC go crazy.

First theory, was that because it was pausing so often, I just couldn't connect to Cassandra.
Second theory: Cassandra was running on the same machine as Spark and the JVM was taking 100% of all CPU so Cassandra just couldn't answer in time and it looked to the driver like there were no Cassandra host.

Hope this helps!

Unable to open native connection with spark sometimes

1 Answers