Spark dataframe adding new column issue - Structured streaming

Question

I am using spark Structured streaming. I have a Dataframe and adding a new column "current_ts".

inpuDF.withColumn("current_ts", lit(System.currentTimeMillis()))

This does not update every row with current epoch time. It updates the same epcoh time when the job was trigerred causing every row in DF to have the same values. This works well with normal spark jobs. Is this a issue with spark structured streaming ?

Hi @Nats, were you able to achieve this? I have similar requirement. — Vasu

shuvomiah shuvomiah · Accepted Answer · 2018-04-01T08:47:34

Well spark records your transformations as lineage graph, and only executes the graph when some action is called. So it will call

System.currentTimeMillis()

when some action is triggered. What I didn't understand that what in it you find confusing or what are you trying to achieve. Thanks.

Spark dataframe adding new column issue - Structured streaming

4 Answers