0
votes

I apply a real time anomaly detection system through spark streaming. In each streaming interval, if the data point is anomaly, AWS SNS send an email to subscribe accounts. But AWS SNS java sdk like not working in spark streaming. Below is the error message


ERROR StreamingContext: Error starting the context, marking it as stopped java.io.NotSerializableException: DStream checkpointing has been enabled but the DStreams with their functions are not serializable com.amazonaws.services.sns.AmazonSNSClient Serialization stack: - object not serializable (class: com.amazonaws.services.sns.AmazonSNSClient, value: com.amazonaws.services.sns.AmazonSNSClient@a99e813) - field (class: wordCount$$anonfun$main$2, name: snsClient$1, type: class com.amazonaws.services.sns.AmazonSNSClient) - object (class wordCount$$anonfun$main$2, ) - field (class: org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3, name: cleanedF$1, type: interface scala.Function1) - object (class org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3, ) - writeObject data (class: org.apache.spark.streaming.dstream.DStream) - object (class org.apache.spark.streaming.dstream.ForEachDStream, org.apache.spark.streaming.dstream.ForEachDStream@5b56679b) - writeObject data (class: org.apache.spark.streaming.dstream.DStreamCheckpointData) - object (class org.apache.spark.streaming.dstream.DStreamCheckpointData, [ 0 checkpoint files


Does anyone has any idea to solve it.. or have some other solution to send email in spark streaming

thanks a lot

1

1 Answers

1
votes

The error is that the AmazonSNSClient instance is not serializable. This probably means you've instantiated it outside a transformation and are using it inside a transformation. This will cause spark to serialise it.

With non-streaming spark, you could try instantiating your AmazonSNSClient inside a mapPartitions function on the RDD instead, or the equivalent for spark streaming. A quick look at the streaming docs has a section that might be useful to you which seems to cover similar ground around efficiently creating connections to databases, external systems, etc.

The main point is that you need to instantiate your client on the worker, not the driver and then send it to the worker (which requires the instance to be serialisable).