I'm parsing a date column that contains irregular date formats that wouldn't be interpreted by pandas'. Dates include different languages for days, months, and years as well as varying formats. The date entries often include timestamps as well. (Bonus: Would separating them by string/regex with lambda/loops be the fastest method?) What's the best option and workflow to tackle these several tens of thousands of date entries?
The entries unknown to pandas
and dateutil.parser
.
Examples include:
19.8.2017, 21:23:32
31/05/2015 19:41:56
Saturday, 18. May
11 - 15 July 2001
2019/4/28 下午6:29:28
1 JuneMay 2000
19 aprile 2008 21:16:37 GMT+02:00
Samstag, 15. Mai 2010 20:55:10
So 23 Jun 2007 23:45 CEST
28 August 1998
30 June 2001
1 Ноябрь 2008 г. 18:46:59
Sat Jun 18 2011 19:46:46 GMT+0200 (Romance Daylight Time)
May-28-11 6:56:08 PM
Sat Jun 26 2010 21:55:54 GMT+0200 (West-Europa (zomertijd))
lunedì 5 maggio 2008 9.30.33
"ValueError: ('Unknown string format:', '1 JuneMay 2000')"
I realize this may be a cumbersome and undesirable task. Luckily the dates are currently nonessential to my project so they may be left alone, but a solution would be favorable. Any and all replies are appreciated, thank you.