Converting StringType to TimestampType in Spark 3.0

Question

I have been using pyspark 3.0. I have a dataframe with a column 'time' in StringType. I am trying to convert this to timestamp. The dataframe looks like this.

+---------------+
|           time|
+---------------+
|10:59:46.000 AM|
| 6:26:36.000 PM|
|11:13:38.000 PM|
+---------------+

I tried both to_timestamp() and unix_timestamp.

df.withColumn("new_time", F.to_timestamp(col("time"),"hh:mm:ss.SSS a")).show()

.

df.withColumn('new_time', F.unix_timestamp(inputDF['time'], 'hh:mm:ss.SSS a').cast(TimestampType())).show()

The error I'm getting is this.

org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse &#39;6:26:36.000 PM&#39; in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.

I want to know how it's done in Spark 3.0 without setting

spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY")

Any help would be much appreciated. Thanks.

Som Som · Accepted Answer · 2020-08-03T11:50:48

Try this-

df.withColumn("new_time", F.to_timestamp(F.lpad(col("time"), 15, "0"),"hh:mm:ss.SSS a")).show()

some Explanation

1. lpad(column, length, "<string_to_be_padded>")-

This function checks the length of the string specified and will left pad with the string_to_be_padded if length of string < length specified.

Example

the input row 6:26:36.000 PM has only 14 chars and since the length specified is 15 it will left pad 0 (3rd argument) to make it of length 15. Now the o/p od lpad is 06:26:36.000 PM. this matches our format specified in to_timestamp

More explanation here

Converting StringType to TimestampType in Spark 3.0

2 Answers