I want to insert huge volume of data from Spark into Cassandra. The data has a timestamp column which determines ttl. But, this differs for each row. My question is, how can I handle ttl while inserting data in bulk from Spark.
My current implementation -
raw_data_final.write.format("org.apache.spark.sql.cassandra")
.mode(SaveMode.Overwrite).options(Map("table" -> offerTable ,
"keyspace" -> keySpace, "spark.cassandra.output.ttl" -> ttl_seconds)).save
Here raw_data_final has around a million records with each record yielding a different ttl. So, is there a way to do a bulk insert and somehow specify ttl from a column within raw_data.
Thanks.