0
votes

I would like to use Spark on Cassandra. I currently have installed Spark 2.0 and Cassandra 3.7. Which version of the spark-cassandra-connector should I use and what other maven dependencies do I have to include? Or do I have to fallback to an older version of Spark and/or Cassandra?

I'm trying to run the following example:

  // Generate products hierarchy
  final List<Product> products = Arrays.asList(
    new Product(0, "All Products", Collections.<Integer>emptyList()),
    new Product(1, "Product A", Arrays.asList(0)),
    new Product(4, "Product A1", Arrays.asList(0,1)),
    new Product(5, "Product A2", Arrays.asList(0,1)),
    new Product(2, "Product B", Arrays.asList(0)),
    new Product(6, "Product B1", Arrays.asList(0,2)),
    new Product(7, "Product B2", Arrays.asList(0,2)),
    new Product(3, "Product C", Arrays.asList(0)),
    new Product(8, "Product C1", Arrays.asList(0,3)),
    new Product(9, "Product C2", Arrays.asList(0,3))
  );

  // Store product hierarchy in Cassandra
  JavaRDD<Product> productsRdd = sc.parallelize(products);
  javaFunctions(productsRdd).writerBuilder("sales_planning", "products", mapToRow(Product.class)).saveToCassandra();

and my POM looks like: ... com.datastax.cassandra cassandra-driver-mapping 3.1.0

<dependency>
   <groupId>org.apache.spark</groupId>
   <artifactId>spark-core_2.11</artifactId>
   <version>1.6.2</version>
</dependency>

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-sql_2.11</artifactId>
  <version>1.6.2</version>
</dependency>

<dependency> <!-- Spark Cassandra Connector -->
   <groupId>com.datastax.spark</groupId>
   <artifactId>spark-cassandra-connector_2.11</artifactId>
   <version>2.0.0-M2</version>
</dependency>  

<dependency>
   <groupId>com.datastax.spark</groupId>
   <artifactId>spark-cassandra-connector-java_2.11</artifactId>
   <version>1.6.0-M1</version>
</dependency>

</dependencies>
...

running the example code gets me the following exception: org.apache.spark.executor.TaskMetrics.outputMetrics()Lorg/apache/spark/executor/OutputMetrics;

After updating my pom to: com.datastax.cassandra cassandra-driver-mapping 3.1.0

   <dependency>
     <groupId>org.apache.spark</groupId>
     <artifactId>spark-core_2.11</artifactId>
     <version>2.0.0</version>
   </dependency>

   <dependency>
     <groupId>org.apache.spark</groupId>
     <artifactId>spark-sql_2.11</artifactId>
     <version>2.0.0</version>
   </dependency>

   <dependency> <!-- Spark Cassandra Connector -->
     <groupId>com.datastax.spark</groupId>
     <artifactId>spark-cassandra-connector_2.11</artifactId>
     <version>2.0.0-M2</version>
   </dependency>  
 </dependencies>

I now have Eclipse mark my POM file and gives me (among others): Description Resource Path Location Type Failed to read artifact descriptor for org.mortbay.jetty:jetty-util:jar:6.1.26

Description Resource Path Location Type Missing artifact com.datastax.spark:spark-cassandra-connector_2.11:jar:2.0.0-M2 pom.xml

Description Resource Path Location Type Missing artifact com.datastax.spark:spark-cassandra-connector_2.11:jar:2.0.0-M2 pom.xml

1
Are you using Scala? There are two versions for Scala.Sreekar
sorry, no im using JavaChris

1 Answers

1
votes

At this moment, use the 2.0.0-M2 from the packages repository. There is no need to add any other dependencies as they are automatically marked and retrieved. The Default Scala version is 2.11 for Spark 2.0.0 so be sure to choose a 2.11 package.

In general you will want the latest version which matches the Spark Version you are using.

-- In response to question Edit

Change the Spark Versions to 2.0 if that what you are running against. Remove reference to -java modules because those files are part of the main dependency.