1
votes

I am getting date values from pyspark datafame in "mm.dd.yy" format. I would like to convert it into "mm.dd.yyyy" format.

I tried writing an UDF but date time function throws error.

from pyspark.sql.types import StringType
from pyspark.sql.functions import udf
import datetime

def change_date(date_string):
  dateConv = datetime.datetime.strptime(date_string,'%d.%m.%y')
  dt_str = datetime.datetime.strftime(dateConv,'%d.%m.%Y')

  return dt_str

date_udf = udf(lambda date: change_date(date),  StringType())
display(filterEmplyValues.withColumn("date", date_udf(col("date"))))

The error that I am receiving is: ` SparkException: Job aborted due to stage failure: Task 23 in stage 302.0 failed 4 times, most recent failure: Lost task 23.3 in stage 302.0 (TID 18078, 10.139.64.15, executor 71): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/databricks/spark/python/pyspark/worker.py", line 480, in main process()

ValueError: time data '00.00.00' does not match format '%d.%m.%y' `

Thank you for help.

1

1 Answers

1
votes

You can do this without UDF using spark in-built functions to_date and date_format.

df.show()

+--------+
|    date|
+--------+
|08.27.18|
+--------+

from pyspark.sql import functions as F
df.withColumn("date", F.date_format(F.to_date("date", "MM.dd.yy"),"MM.dd.yyyy")).show()

+----------+
|      date|
+----------+
|08.27.2018|
+----------+