0
votes

Does spark 2.0 compatible with (datastax) cassandra 2.1.13? I have installed spark 2.1.0 on my local mac and also installed scala 2.11.x. I am trying to read to cassandra table from server which has datastax 4.8.6 installed ( spark 1.4 and cassandra 2.1.13)

I am running following code on spark shell

spark-shell

import org.apache.spark.sql.SparkSession

import org.apache.spark.sql.implicits._
import org.apache.spark.sql.cassandra._
import com.datastax.spark.connector.cql._
import org.apache.spark.sql
import org.apache.spark.SparkContext._
import com.datastax.spark.connector.cql.CassandraConnector._

spark.stop

val sparkSession = SparkSession.builder.appName("Spark app").config("spark.cassandra.connection.host",CassandraNodeList).config("spark.cassandra.auth.username", CassandraUser).config("spark.cassandra.auth.password", CassandraPassword).config("spark.cassandra.connection.port", "9042").getOrCreate()

sparkSession.sql("""CREATE TEMPORARY view hdfsfile
     |USING org.apache.spark.sql.cassandra
     |OPTIONS (
     |  table "hdfs_file",
     |  keyspace "keyspaceName")""".stripMargin)

********** getting following error*****

17/02/28 10:33:02 ERROR Executor: Exception in task 8.0 in stage 3.0 (TID 20) java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class at com.datastax.spark.connector.util.CountingIterator.(CountingIterator.scala:4) at com.datastax.spark.connector.rdd.CassandraTableScanRDD.compute(CassandraTableScanRDD.scala:336) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

1

1 Answers

1
votes

This is a Scala version mismatch error. You are using a scala 2.10 library with scala 2.11 (or vice versa). It is explained in the SCC FAQ

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md#what-does-this-mean-noclassdeffounderror-scalacollectiongentraversableonceclass

Quoting the FAQ

This means that there is a mix of Scala versions in the libraries used in your code. The collection api is different between Scala 2.10 and 2.11 and this the most common error which occurs if a scala 2.10 library is attempted to be loaded in a Scala 2.11 runtime. To fix this make sure that the library name has the correct Scala version suffix to match your Scala version.