Apparently there's no built-in support for a Cassandra sink in Spark streaming. I found this example online which implements a custom Cassandra sink for Spark structured streaming based on ForEachWriter:
https://dzone.com/articles/cassandra-sink-for-spark-structured-streaming
I understand that we need to create a ForeachWriter implementation that takes care of opening a connection to the sink (Cassandra), writing the data and closing the connection. So the CassandraSinkForeach and the CassandraDriver classes make sense.
However I don't get the need to make SparkSessionBuilder serializable and even the need to initialize the SparkSession instance inside the CassandraDriver class. Seems like the only reason for doing this is to initialize the CassandraConnector from the sparkConf.
According to the CassandraConnector docs, a CassandraConnector object can initialized from a CassandraConnectorConfig passed in: http://datastax.github.io/spark-cassandra-connector/ApiDocs/2.4.0/spark-cassandra-connector/#com.datastax.spark.connector.cql.CassandraConnector
Can someone explain if there is a need to initialize SparkSession in the workers? Is this is a general pattern and if so, why the requirement?