I am getting the below exception when processing input streams using Spark structured streaming.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 22 in stage 5.0 failed 1 times, most recent failure: Lost task 22.0 in stage 5.0 (TID 403, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
I have handled watermark as given below,
.withWatermark("timestamp", "5 seconds")
.groupBy(window($"timestamp", "1 second"), $"column")
What could be the issue? I have tried changing the trigger from default to fixed interval but still I am still facing the problem.