0
votes

I am new to pyspark and Spark SQL. I have a dataframe with one column having date time values in string which I need to convert/cast to timestamp.

Dataframe Format:

+--------------------+------------------------------+
|               value|                time_from_text|
+--------------------+------------------------------+
|dummy               |2020-04-19T23:49:52.020000453Z|
|dummy               |2020-04-22T23:52:52.020000453Z|
+--------------------+------------------------------+

Now, I looked at this post and tried the following code snippet:

result.withColumn("Timestamp",unix_timestamp("time_from_text", "yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS'Z'").cast(TimestampType()))

This did work in my earlier case where my spark version was 3.1.1. However, I needed to switch back to 2.4.6 and here, the same code is giving me null as the output for the timestamp!

I tried many different ways but am not able to cast the timestamp.

Any pointers would be appreciated. Thanks for the help!

1

1 Answers

1
votes

This is not a perfect answer but I found a quick workaround to get the conversion done. Somehow, the conversion does take place for the format "yyyy-MM-dd'T'HH:mm:ss". So, I truncated the time_from_text column to lose the sub-seconds accuracy [which is fine for the use case here] and then did the conversion to timestamp.

Code snippet:

result = result.withColumn("time_from_text", substring(col("time_from_text"),0,19))
final_result = result.withColumn("Timestamp",unix_timestamp("time_from_text", "yyyy-MM-dd'T'HH:mm:ss").cast(TimestampType())).orderBy("Timestamp")

Reason:

I did some research and my best guess is that during the upgrading from Spark SQL 3.0 to 3.1, there were some changes done to unix_timestamp codebase. Hence, the format "2020-04-19T23:49:52.020000453Z" is supported in the newer versions but not in the older ones which were implemented via DateTimeFormatter under the hood. Also, the older versions resulted in null for invalid datetime patterns while the newer version will fail directly.

Source: https://spark.apache.org/docs/latest/sql-migration-guide.html