How do I convert a string to a date object in python?
The string would be: "24052010"
(corresponding to the format: "%d%m%Y"
)
I don't want a datetime.datetime object, but rather a datetime.date.
Directly related question:
What if you have
datetime.datetime.strptime("2015-02-24T13:00:00-08:00", "%Y-%B-%dT%H:%M:%S-%H:%M").date()
and you get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/_strptime.py", line 308, in _strptime
format_regex = _TimeRE_cache.compile(format)
File "/usr/local/lib/python2.7/_strptime.py", line 265, in compile
return re_compile(self.pattern(format), IGNORECASE)
File "/usr/local/lib/python2.7/re.py", line 194, in compile
return _compile(pattern, flags)
File "/usr/local/lib/python2.7/re.py", line 251, in _compile
raise error, v # invalid expression
sre_constants.error: redefinition of group name 'H' as group 7; was group 4
and you tried:
<-24T13:00:00-08:00", "%Y-%B-%dT%HH:%MM:%SS-%HH:%MM").date()
but you still get the traceback above.
Answer:
>>> from dateutil.parser import parse
>>> from datetime import datetime
>>> parse("2015-02-24T13:00:00-08:00")
datetime.datetime(2015, 2, 24, 13, 0, tzinfo=tzoffset(None, -28800))
If you are lazy and don't want to fight with string literals, you can just go with the parser
module.
from dateutil import parser
dt = parser.parse("Jun 1 2005 1:33PM")
print(dt.year, dt.month, dt.day,dt.hour, dt.minute, dt.second)
>2005 6 1 13 33 0
Just a side note, as we are trying to match any
string representation, it is 10x slower than strptime
For single value the datetime.strptime
method is the fastest
import arrow
from datetime import datetime
import pandas as pd
l = ['24052010']
%timeit _ = list(map(lambda x: datetime.strptime(x, '%d%m%Y').date(), l))
6.86 µs ± 56.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit _ = list(map(lambda x: x.date(), pd.to_datetime(l, format='%d%m%Y')))
305 µs ± 6.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit _ = list(map(lambda x: arrow.get(x, 'DMYYYY').date(), l))
46 µs ± 978 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
For a list of values the pandas pd.to_datetime
is the fastest
l = ['24052010'] * 1000
%timeit _ = list(map(lambda x: datetime.strptime(x, '%d%m%Y').date(), l))
6.32 ms ± 89.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit _ = list(map(lambda x: x.date(), pd.to_datetime(l, format='%d%m%Y')))
1.76 ms ± 27.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit _ = list(map(lambda x: arrow.get(x, 'DMYYYY').date(), l))
44.5 ms ± 522 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
For ISO8601 datetime format the ciso8601
is a rocket
import ciso8601
l = ['2010-05-24'] * 1000
%timeit _ = list(map(lambda x: ciso8601.parse_datetime(x).date(), l))
241 µs ± 3.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
There is another library called arrow
really great to make manipulation on python date.
import arrow
import datetime
a = arrow.get('24052010', 'DMYYYY').date()
print(isinstance(a, datetime.date)) # True