0
votes

First of all, thank you for the time in reading my question :)

My question is the following: In Spark with Scala, i have a dataframe that there contains a string with a date in format dd/MM/yyyy HH:mm, for example df

+----------------+
|date            |
+----------------+
|8/11/2017 15:00 |
|9/11/2017 10:00 |
+----------------+

i want to get the difference of currentDate with date of dataframe in second, for example

df.withColumn("difference", currentDate - unix_timestamp(col(date)))

+----------------+------------+
|date            | difference |
+----------------+------------+
|8/11/2017 15:00 | xxxxxxxxxx |
|9/11/2017 10:00 | xxxxxxxxxx |
+----------------+------------+

I try

val current = current_timestamp()
df.withColumn("difference", current - unix_timestamp(col(date)))

but get this error

org.apache.spark.sql.AnalysisException: cannot resolve '(current_timestamp() - unix_timestamp(date, 'yyyy-MM-dd HH:mm:ss'))' due to data type mismatch: differing types in '(current_timestamp() - unix_timestamp(date, 'yyyy-MM-dd HH:mm:ss'))' (timestamp and bigint).;;

I try too

val current = BigInt(System.currenttimeMillis / 1000)
df.withColumn("difference", current - unix_timestamp(col(date))) 

and

val current = unix_timestamp(current_timestamp())
but the col "difference" is null

Thanks

1

1 Answers

1
votes

You have to use correct format for unix_timestamp:

df.withColumn("difference", current_timestamp().cast("long") - unix_timestamp(col("date"), "dd/mm/yyyy HH:mm"))

or with recent version:

to_timestamp(col("date"), "dd/mm/yyyy HH:mm") - current_timestamp())

to get Interval column.