Thanks for responses, excellent advice, for the moment I'll be going with the LEGACY setting. I have a workaround with Spark 3.0 by substringing out the EEE element but I've noticed a bug with how BST timezone converts incorrectly offseting by 10 hours while under LEGACY it correctly remains the same as I'm currently in BST zone. I can do something with this but will wait till the clocks change in the autumn to confirm.
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
df = spark.createDataFrame([('Fri May 24 00:00:00 BST 2019',)], ['mydate'])
df = df.select('mydate',
to_timestamp(df.mydate.substr(5, 28), 'MMM dd HH:mm:ss zzz yyyy').alias('datetime'),
to_timestamp(df.mydate, 'EEE MMM dd HH:mm:ss zzz yyyy').alias('LEGACYdatetime')
).show(1, False)
df = spark.createDataFrame([('Fri May 24 00:00:00 GMT 2019',)], ['mydate'])
df = df.select('mydate',
to_timestamp(df.mydate.substr(5, 28), 'MMM dd HH:mm:ss zzz yyyy').alias('datetime'),
to_timestamp(df.mydate, 'EEE MMM dd HH:mm:ss zzz yyyy').alias('LEGACYdatetime')
).show(1, False)
spark.sql("set spark.sql.legacy.timeParserPolicy=CORRECTED")
df = spark.createDataFrame([('Fri May 24 00:00:00 BST 2019',)], ['mydate'])
df = df.select('mydate',
to_timestamp(df.mydate.substr(5, 28), 'MMM dd HH:mm:ss zzz yyyy').alias('datetime')
).show(1, False)
df = spark.createDataFrame([('Fri May 24 00:00:00 GMT 2019',)], ['mydate'])
df = df.select('mydate',
to_timestamp(df.mydate.substr(5, 28), 'MMM dd HH:mm:ss zzz yyyy').alias('datetime')
).show(1, False)
+----------------------------+-------------------+-------------------+
|mydate |datetime |LEGACYdatetime |
+----------------------------+-------------------+-------------------+
|Fri May 24 00:00:00 BST 2019|2019-05-24 00:00:00|2019-05-24 00:00:00|
+----------------------------+-------------------+-------------------+
+----------------------------+-------------------+-------------------+
|mydate |datetime |LEGACYdatetime |
+----------------------------+-------------------+-------------------+
|Fri May 24 00:00:00 GMT 2019|2019-05-24 01:00:00|2019-05-24 01:00:00|
+----------------------------+-------------------+-------------------+
+----------------------------+-------------------+
|mydate |datetime |
+----------------------------+-------------------+
|Fri May 24 00:00:00 BST 2019|2019-05-23 14:00:00|
+----------------------------+-------------------+
+----------------------------+-------------------+
|mydate |datetime |
+----------------------------+-------------------+
|Fri May 24 00:00:00 GMT 2019|2019-05-24 01:00:00|
+----------------------------+-------------------+