0
votes

I am trying to read data from Cassandra keyspace in Pyspark.

Here is my code:

from pyspark import SparkContext                       
from pyspark import SparkConf
from pyspark.sql import SQLContext
conf = SparkConf()
conf.setMaster("local[4]")
conf.setAppName("Spark Cassandra")
conf.set("spark.cassandra.connection.host","127.0.0.1")
sqlContext.read\                                       
    .format("org.apache.spark.sql.cassandra")\
    .options(table="kv", keyspace="tutorialspoint")\
    .load().show()

I am running it on CentOS 6.7 VM, Spark 1.5, Hadoop 2.6.0, Cassandra 2.1.13

Launching the pyspark console with the command:

pyspark --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-M2

Tried to launch pyspark console for different version of cassandra-connector package but that did not help.

Here is the error message I face in console when doing the read:

Py4JJavaError: An error occurred while calling o29.load. : java.lang.NoSuchMethodError: com.google.common.reflect.TypeToken.isPrimitive()Z at com.datastax.driver.core.TypeCodec.(TypeCodec.java:142) at com.datastax.driver.core.TypeCodec.(TypeCodec.java:136) at com.datastax.driver.core.TypeCodec$BlobCodec.(TypeCodec.java:609) at com.datastax.driver.core.TypeCodec$BlobCodec.(TypeCodec.java:606) at com.datastax.driver.core.CodecRegistry.(CodecRegistry.java:147) at com.datastax.driver.core.Configuration$Builder.build(Configuration.java:259) at com.datastax.driver.core.Cluster$Builder.getConfiguration(Cluster.java:1135) at com.datastax.driver.core.Cluster.(Cluster.java:111) at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:178) at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:1152) at com.datastax.spark.connector.cql.DefaultConnectionFactory$.createCluster(CassandraConnectionFactory.scala:85) at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:155) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150) at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31) at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56) at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81) at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109) at com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:120) at com.datastax.spark.connector.cql.Schema$.fromCassandra(Schema.scala:241) at org.apache.spark.sql.cassandra.CassandraSourceRelation.(CassandraSourceRelation.scala:47) at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:184) at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:57) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) a

1

1 Answers

1
votes

It's because of Guava version conflicts. Spark Cassandra Connector and Hadoop use different Guava versions. See https://datastax-oss.atlassian.net/browse/SPARKC-365 and a pending PR to fix it: https://github.com/datastax/spark-cassandra-connector/pull/968