1
votes

I have a pyspark dataframe that has a field, time, that has timestamps in two formats,

"11-04-2019,00:32:13" and "2019-12-05T07:57:16.000Z" 

How can I convert all the timestamp to the second format, which is the iso time format that matches this format?

%Y-%m-%dT%H:%M:%S.%fZ

So ideally values like "11-04-2019,00:32:13" should be "2019-11-04T00:32:13.000Z"

1

1 Answers

4
votes

You can cast the timestamps to timestamp type first, then use date_format to convert them back to the desired format:

df.show(truncate=False)
+------------------------+
|time                    |
+------------------------+
|11-04-2019 00:32:13     |
|2019-12-05T07:57:16.000Z|
+------------------------+

import pyspark.sql.functions as F

df2 = df.withColumn(
    'time', 
    F.date_format(
        F.coalesce(
            F.to_timestamp('time'), 
            F.to_timestamp('time', 'MM-dd-yyyy HH:mm:ss')
        ), 
        "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
    )
)

df2.show(truncate=False)
+------------------------+
|time                    |
+------------------------+
|2019-11-04T00:32:13.000Z|
|2019-12-05T07:57:16.000Z|
+------------------------+