I am trying to run a structured streaming application which writes the output files as parquet to Google cloud storage. I don't see any errors. But it does not write the files to GCS location. I could see only spark-metadata folder. Any idea how I can debug?
WindowDuration = "60 minutes";
SlideDuration = "10 minutes";
Data_2 = complete_data;
Data_2 = data_2.withColumn("creationDt", functions.to_timestamp( functions.from_unixtime(col(topics+"."+event_timestamp).divide(1000.0))));
Data_2 = data_2
.withWatermark("creationDt","1 minute")
.groupBy(col(topics+"."+keyField),functions.window(col("creationDt"), windowDuration, slideDuration),col(topics+"."+aggregateByField))
.count();
Query_2 = data_2
.withColumn("startwindow", col("window.start"))
.withColumn("endwindow", col("window.end"))
.withColumn("endwindow_date", col("window.end").cast(DataTypes.DateType))
.writeStream()
.format("parquet")
.partitionBy("endwindow_date")
.option("path",dataFile_2)
.option("truncate", "false")
.outputMode("append")
.option("checkpointLocation", checkpointFile_2).start();
Query_2.awaitTermination()
spark-metadatafolder? What's the source(s)? Any aggregations? More, more, more... - Jacek Laskowski