0
votes

I`m using Cassandra 2.1.5 (dsc), Spark 1.2.1 with spark-cassandra-connector 1.2.1.

When I run the Spark job (scala script) I get the following error:

16/03/08 10:22:03 INFO DAGScheduler: Job 0 failed: reduce at JsonRDD.scala:57, took 15.051150 s Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 1 times, most recent failure: Lost task 1.0 in stage 1.0 (TID 4, localhost): com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.spark.sql.cassandra.CassandraS QLRow

I tried to do what is described here by using:

/home/ubuntu/spark-1.2.1/bin/spark-submit --driver-class-path /home/ubuntu/.ivy2/cache/com.datastax.spark/spark-cassandra-connector_2.10/jars/spark-cassandra-connector_2.10-1.2.1.jar --conf spark.executor.extraClassPath=/home/ubuntu/.ivy2/cache/com.datastax.spark/spark-cassandra-connector_2.10/jars/spark-cassandra-connector_2.10-1.2.1.jar --class "$class" "$jar"

But only get the following error:

Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/driver/core/ConsistencyLevel at com.datastax.spark.connector.writer.WriteConf$.(WriteConf.scala:76) at com.datastax.spark.connector.writer.WriteConf$.(WriteConf.scala) at com.datastax.spark.connector.util.ConfigCheck$.(ConfigCheck.scala:23) at com.datastax.spark.connector.util.ConfigCheck$.(ConfigCheck.scala) at com.datastax.spark.connector.cql.CassandraConnectorConf$.apply(CassandraConnectorConf.scala:81) at com.datastax.spark.connector.cql.CassandraConnector$.apply(CassandraConnector.scala:204) at com.datastax.spark.connector.RDDFunctions.joinWithCassandraTable$default$5(RDDFunctions.scala:127) at co.crowdx.aggregation.SignalIO$.main(SignalIO.scala:92) at co.crowdx.aggregation.SignalIO.main(SignalIO.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: java.lang.ClassNotFoundException: com.datastax.driver.core.ConsistencyLevel

What could be the issue? (I do not want to upgrade spark or cassandra right now)

EDIT:

I tried to run spark shell to see the problem in more simple way:

spark-1.2.1/bin/spark-shell --jars /home/ubuntu/.ivy2/cache/com.datastax.spark/spark-cassandra connector_2.10/jars/spark-cassandra-connector_2.10-1.2.1.jar --conf spark.cassandra.connection.host=11.11.11.11

And try to run simple commands:

> scala> import org.apache.spark.sql.cassandra.CassandraSQLContext
> import org.apache.spark.sql.cassandra.CassandraSQLContext
> 
> scala> import org.apache.spark.sql.SchemaRDD import
> org.apache.spark.sql.SchemaRDD
> 
> scala> val cc = new CassandraSQLContext(sc) cc:
> org.apache.spark.sql.cassandra.CassandraSQLContext =
> org.apache.spark.sql.cassandra.CassandraSQLContext@1c41c05e
> 
> scala> val rdd = cc.sql("select * from listener.scans_daily_imei_partitioned as a, listener.scans_daily_imei_partitioned as b where a.id=b.id")

I got the following error:

rdd: org.apache.spark.sql.SchemaRDD = SchemaRDD[0] at RDD at SchemaRDD.scala:108 == Query Plan == == Physical Plan == com.google.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: com/datastax/driver/core/ConsistencyLevel

1
Could there be multiple versions of the spark datastax connector when you execute your job?Yuval Itzchakov
No, and I explicitly define the path to the connector, so why is it matter?Reshef
It shouldn't, but I'm still wondering if there could be another versioned compiled into your jar which may be causing the issue.Yuval Itzchakov

1 Answers

3
votes

I solved the issue with self compiled spark-cassandra-connector:

wget https://github.com/datastax/spark-cassandra-connector/archive/v1.2.1.zip
unzip v1.2.1.zip
cd spark-cassandra-connector-1.2.1
sbt assembly
cp /home/ubuntu/spark-cassandra-connector-1.2.1/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.2.1.jar spark-cassandra-connector-java-assembly-1.2.1-FAT.jar

And using the spark as:

/home/ubuntu/spark-1.2.1/bin/spark-submit --driver-class-path /home/ubuntu/spark-cassandra-connector-java-assembly-1.2.1-FAT.jar --conf spark.executor.extraClassPath=/home/ubuntu/spark-cassandra-connector-java-assembly-1.2.1-FAT.jar --class "$class" "$jar"

It worked perfectly.