2
votes

I am using spark-cassandra-connector_2.11 (version 2.0.5) to load data from Cassandra into Spark cluster. I am using read api to load the data as follows :

SparkUtil.initSpark()
         .read
         .format("org.apache.spark.sql.cassandra")
         .options(Map("table"-><table_name>, "keyspace"-><keyspace>))
         .load()

Its working fine, however, in one of the use case I want to read only a specific column from Cassandra. How to use read api to do the same?

3

3 Answers

4
votes
SparkUtil.initSpark()
         .read
         .format("org.apache.spark.sql.cassandra")
         .options(Map("table"-><table_name>, "keyspace"-><keyspace>))
         .load()
         .select("column_name")

Use select.. you can also use case classes

2
votes

Other way is to use following approach without using options api.

SparkUtil.initSpark()
         .sparkContext
         .cassandraTable(<keyspace>, <table_name>) 
         .select(<column_name>)
-1
votes

One line solution for fetching few columns from Cassandra table :

val rdd=sc.cassandraTable("keyspace","table_name")
.select("service_date","mobile").persist(StorageLevel.MEMORY_AND_DISK)