I would like to use Spark on Cassandra. I currently have installed Spark 2.0 and Cassandra 3.7. Which version of the spark-cassandra-connector should I use and what other maven dependencies do I have to include? Or do I have to fallback to an older version of Spark and/or Cassandra?
I'm trying to run the following example:
// Generate products hierarchy
final List<Product> products = Arrays.asList(
new Product(0, "All Products", Collections.<Integer>emptyList()),
new Product(1, "Product A", Arrays.asList(0)),
new Product(4, "Product A1", Arrays.asList(0,1)),
new Product(5, "Product A2", Arrays.asList(0,1)),
new Product(2, "Product B", Arrays.asList(0)),
new Product(6, "Product B1", Arrays.asList(0,2)),
new Product(7, "Product B2", Arrays.asList(0,2)),
new Product(3, "Product C", Arrays.asList(0)),
new Product(8, "Product C1", Arrays.asList(0,3)),
new Product(9, "Product C2", Arrays.asList(0,3))
);
// Store product hierarchy in Cassandra
JavaRDD<Product> productsRdd = sc.parallelize(products);
javaFunctions(productsRdd).writerBuilder("sales_planning", "products", mapToRow(Product.class)).saveToCassandra();
and my POM looks like: ... com.datastax.cassandra cassandra-driver-mapping 3.1.0
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>1.6.2</version>
</dependency>
<dependency> <!-- Spark Cassandra Connector -->
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>2.0.0-M2</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.11</artifactId>
<version>1.6.0-M1</version>
</dependency>
</dependencies>
...
running the example code gets me the following exception: org.apache.spark.executor.TaskMetrics.outputMetrics()Lorg/apache/spark/executor/OutputMetrics;
After updating my pom to: com.datastax.cassandra cassandra-driver-mapping 3.1.0
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency> <!-- Spark Cassandra Connector -->
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>2.0.0-M2</version>
</dependency>
</dependencies>
I now have Eclipse mark my POM file and gives me (among others): Description Resource Path Location Type Failed to read artifact descriptor for org.mortbay.jetty:jetty-util:jar:6.1.26
Description Resource Path Location Type Missing artifact com.datastax.spark:spark-cassandra-connector_2.11:jar:2.0.0-M2 pom.xml
Description Resource Path Location Type Missing artifact com.datastax.spark:spark-cassandra-connector_2.11:jar:2.0.0-M2 pom.xml