I am writing a code to cache RDBMS data using spark SQLContext JDBC connection. Once a Dataframe is created I want to cache that reusltset using apache ignite thereby making other applications to make use of the resultset. Here is the code snippet.
object test
{
def main(args:Array[String])
{
val configuration = new Configuration()
val config="src/main/scala/config.xml"
val sparkConf = new SparkConf().setAppName("test").setMaster("local[*]")
val sc=new SparkContext(sparkConf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val sql_dump1=sqlContext.read.format("jdbc").option("url", "jdbc URL").option("driver", "com.mysql.jdbc.Driver").option("dbtable", mysql_table_statement).option("user", "username").option("password", "pass").load()
val ic = new IgniteContext[Integer, Integer](sc, config)
val sharedrdd = ic.fromCache("hbase_metadata")
//How to cache sql_dump1 dataframe
}
}
Now the question is how to cache a dataframe, IgniteRDD has savepairs method but it accepts key and value as RDD[Integer], but I have a dataframe even if I convert that to RDD i would only be getting RDD[Row]. The savepairs method consisting of RDD of Integer seems to be more specific what if I have a string of RDD as value? Is it good to cache dataframe or any other better approach to cache the resultset.