2
votes

Im trying to convert the following to a 24 hour time using sparklyr:

2021-05-18 9:00:00 PM

My expected outcome: 2021-05-18 21:00:00

I have tried:

data %>% 
  mutate(datetime_24 = to_timestamp("datetime_12", "yyyy-MM-dd hh:mm:ss"))

data %>% 
  mutate(datetime_24 = to_date("datetime_12", "yyyy-MM-dd hh:mm:ss"))

Both result in NULLs.

I tried the following as a starting point and got this error

data %>%
  mutate(datetime_24 = unix_timestamp(datetime_12, "yyyy-MM-dd hh:mm:ss"))

You may get a different result due to the upgrading of Spark 3.0: Fail to parse '2021-05-18 9:00:00 PM' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.

I also tried the following in pyspark but got a similar error:

from pyspark.sql.functions import from_unixtime, unix_timestamp, col

df_time = spark.table("data")

df_time_new = df_time.withColumn('datetime_24', \
             from_unixtime(unix_timestamp(col(('datetime_12')), "yyyy-mm-dd hh:mm:ss"), "yyyy-mm-dd HH:mm:ss"))

Error:

You may get a different result due to the upgrading of Spark 3.0: Fail to parse '2021-05-18 9:00:00 PM' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string. Caused by: DateTimeParseException: Text '2021-05-18 9:00:00 PM' could not be parsed at index 11

1

1 Answers

2
votes

You can set spark.sql.legacy.timeParserPolicy to LEGACY as shown below:

spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY")

After this, you should not get the error while parsing datetime.

Since there are some changes related to datetime parser in spark v3.0 (Read here), you are getting that error.

Read datetime patterns here. According to this you can use pattern 'a' for PM or AM parsing.

to_date("datetime_12", "yyyy-MM-dd hh:mm:ss a")