0
votes

I am trying to build a simple project with Spark+Cassandra for a SQL-analytics demo. I need to use Cassandra v2.0.14 (can't upgrade it for now). I am unable to find the correct version of Spark and Spark-cassandra-connector. I referred to Datastax's git project at - https://github.com/datastax/spark-cassandra-connector, and I know that the Spark and Spark-cassandra-connector versions need to match and be compatible with Cassandra. Hence, would like anyone to help pointing out the exact versions for Spark, Spark-Cassandra-connector. I tried using v1.1.0 and v1.2.1 for both Spark and Spark-Cassandra-connector - but unable to build the spark-cassandra-connector jat jar with neither the supplied sbt (fails because the downloaded sbt-launch jar just contains a 404 not found html), nor my local sbt v0.13.8 (fails for compilation error for "import sbtassembly.Plugin.", "import AssemblyKeys.")

2

2 Answers

0
votes

if you can upgrade version of spark then you can connect with spark with cassandra .

put following maven dependency in pom file :-

cassandra-all cassandra-core cassandra-mapping cassandra-thrift cassandra-client spark-cassandra-connector spark-cassandra-connector-java

this will be work.

0
votes

The connector works with Cassandra 2.0 and 2.1 but some features may also work fine with 2.2 and 3.0 (not officially supported yet) using the older Java driver 2.1. This is because C* Java driver supports a wide range of Cassandra versions. The newer driver works with older C* versions, but also the older driver versions work with newer C* versions, excluding new C* features.

However, there is a one minor caveat with using C* 2.0: Since version 1.3.0, we dropped the thrift client from the connector. This move was to simplify connectivity code and make it easier to debug - debugging one type of connection should be easier than two. It either connects or not, no more surprises of a kind "it writes fine, but can't connect for reading". Unfortunately, not all of the thrift functionality was exposed by the native protocol in C* 2.0 nor in the system tables. Therefore, if you use C* prior to version 2.1.5, automatic split sizing won't work properly and you have to tell the connector the preferred number of splits. This is to be set in ReadConf object passed at the creation of the RDD.

As for the interface between the Connector and Spark, there is much less freedom. Spark APIs change quite often and you typically need a connector dedicated to the Spark version you use. See the version table in the README.

(fails because the downloaded sbt-launch jar just contains a 404 not found html)

This looks like an SBT problem, not a connector problem. I just tried to do sbt clean assembly on all v1.2.5, v1.3.0, b1.4 and it worked fine.