I need to parse RFC 3339 strings like "2008-09-03T20:56:35.450686Z"
into Python's datetime
type.
I have found strptime
in the Python standard library, but it is not very convenient.
What is the best way to do this?
I need to parse RFC 3339 strings like "2008-09-03T20:56:35.450686Z"
into Python's datetime
type.
I have found strptime
in the Python standard library, but it is not very convenient.
What is the best way to do this?
The python-dateutil package has dateutil.parser.isoparse
to parse not only RFC 3339 datetime strings like the one in the question, but also other ISO 8601 date and time strings that don't comply with RFC 3339 (such as ones with no UTC offset, or ones that represent only a date).
>>> import dateutil.parser
>>> dateutil.parser.isoparse('2008-09-03T20:56:35.450686Z') # RFC 3339 format
datetime.datetime(2008, 9, 3, 20, 56, 35, 450686, tzinfo=tzutc())
>>> dateutil.parser.isoparse('2008-09-03T20:56:35.450686') # ISO 8601 extended format
datetime.datetime(2008, 9, 3, 20, 56, 35, 450686)
>>> dateutil.parser.isoparse('20080903T205635.450686') # ISO 8601 basic format
datetime.datetime(2008, 9, 3, 20, 56, 35, 450686)
>>> dateutil.parser.isoparse('20080903') # ISO 8601 basic format, date only
datetime.datetime(2008, 9, 3, 0, 0)
Comparison with dateutil.parser.parse
: isoparse
is presumably stricter than the more hacky , but both of them are quite forgiving and will attempt to interpret the string that you pass in. If you want to eliminate the possibility of any misreads, you need to use something stricter than either of these functions.
Comparison with Python 3.7+’s built-in datetime.datetime.fromisoformat
(see this answer): isoparse
is a full ISO-8601 format parser, but fromisoformat
is deliberately not. Please see the function’s docs for this cautionary caveat.
The datetime
standard library has, since Python 3.7, a function for inverting datetime.isoformat()
.
classmethod
datetime.fromisoformat(date_string)
:Return a
datetime
corresponding to adate_string
in one of the formats emitted bydate.isoformat()
anddatetime.isoformat()
.Specifically, this function supports strings in the format(s):
YYYY-MM-DD[*HH[:MM[:SS[.mmm[mmm]]]][+HH:MM[:SS[.ffffff]]]]
where
*
can match any single character.Caution: This does not support parsing arbitrary ISO 8601 strings - it is only intended as the inverse operation of
datetime.isoformat()
.Examples:
>>> from datetime import datetime >>> datetime.fromisoformat('2011-11-04') datetime.datetime(2011, 11, 4, 0, 0)
…
Be sure to head the caution from the docs!
Note in Python 2.6+ and Py3K, the %f character catches microseconds.
>>> datetime.datetime.strptime("2008-09-03T20:56:35.450686Z", "%Y-%m-%dT%H:%M:%S.%fZ")
See issue here
Several answers here suggest using datetime.datetime.strptime
to parse RFC 3339 or ISO 8601 datetimes with timezones, like the one exhibited in the question:
2008-09-03T20:56:35.450686Z
This is a bad idea.
Assuming that you want to support the full RFC 3339 format, including support for UTC offsets other than zero, then the code these answers suggest does not work. Indeed, it cannot work, because parsing RFC 3339 syntax using strptime
is impossible. The format strings used by Python's datetime module are incapable of describing RFC 3339 syntax.
The problem is UTC offsets. The RFC 3339 Internet Date/Time Format requires that every date-time includes a UTC offset, and that those offsets can either be Z
(short for "Zulu time") or in +HH:MM
or -HH:MM
format, like +05:00
or -10:30
.
Consequently, these are all valid RFC 3339 datetimes:
2008-09-03T20:56:35.450686Z
2008-09-03T20:56:35.450686+05:00
2008-09-03T20:56:35.450686-10:30
Alas, the format strings used by strptime
and strftime
have no directive that corresponds to UTC offsets in RFC 3339 format. A complete list of the directives they support can be found at https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior, and the only UTC offset directive included in the list is %z
:
%z
UTC offset in the form +HHMM or -HHMM (empty string if the the object is naive).
Example: (empty), +0000, -0400, +1030
This doesn't match the format of an RFC 3339 offset, and indeed if we try to use %z
in the format string and parse an RFC 3339 date, we'll fail:
>>> from datetime import datetime
>>> datetime.strptime("2008-09-03T20:56:35.450686Z", "%Y-%m-%dT%H:%M:%S.%f%z")
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.4/_strptime.py", line 500, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File "/usr/lib/python3.4/_strptime.py", line 337, in _strptime
(data_string, format))
ValueError: time data '2008-09-03T20:56:35.450686Z' does not match format '%Y-%m-%dT%H:%M:%S.%f%z'
>>> datetime.strptime("2008-09-03T20:56:35.450686+05:00", "%Y-%m-%dT%H:%M:%S.%f%z")
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.4/_strptime.py", line 500, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File "/usr/lib/python3.4/_strptime.py", line 337, in _strptime
(data_string, format))
ValueError: time data '2008-09-03T20:56:35.450686+05:00' does not match format '%Y-%m-%dT%H:%M:%S.%f%z'
(Actually, the above is just what you'll see in Python 3. In Python 2 we'll fail for an even simpler reason, which is that strptime
does not implement the %z
directive at all in Python 2.)
The multiple answers here that recommend strptime
all work around this by including a literal Z
in their format string, which matches the Z
from the question asker's example datetime string (and discards it, producing a datetime
object without a timezone):
>>> datetime.strptime("2008-09-03T20:56:35.450686Z", "%Y-%m-%dT%H:%M:%S.%fZ")
datetime.datetime(2008, 9, 3, 20, 56, 35, 450686)
Since this discards timezone information that was included in the original datetime string, it's questionable whether we should regard even this result as correct. But more importantly, because this approach involves hard-coding a particular UTC offset into the format string, it will choke the moment it tries to parse any RFC 3339 datetime with a different UTC offset:
>>> datetime.strptime("2008-09-03T20:56:35.450686+05:00", "%Y-%m-%dT%H:%M:%S.%fZ")
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.4/_strptime.py", line 500, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File "/usr/lib/python3.4/_strptime.py", line 337, in _strptime
(data_string, format))
ValueError: time data '2008-09-03T20:56:35.450686+05:00' does not match format '%Y-%m-%dT%H:%M:%S.%fZ'
Unless you're certain that you only need to support RFC 3339 datetimes in Zulu time, and not ones with other timezone offsets, don't use strptime
. Use one of the many other approaches described in answers here instead.
Try the iso8601 module; it does exactly this.
There are several other options mentioned on the WorkingWithTime page on the python.org wiki.
Starting from Python 3.7, strptime supports colon delimiters in UTC offsets (source). So you can then use:
import datetime
datetime.datetime.strptime('2018-01-31T09:24:31.488670+00:00', '%Y-%m-%dT%H:%M:%S.%f%z')
EDIT:
As pointed out by Martijn, if you created the datetime object using isoformat(), you can simply use datetime.fromisoformat()
What is the exact error you get? Is it like the following?
>>> datetime.datetime.strptime("2008-08-12T12:20:30.656234Z", "%Y-%m-%dT%H:%M:%S.Z")
ValueError: time data did not match format: data=2008-08-12T12:20:30.656234Z fmt=%Y-%m-%dT%H:%M:%S.Z
If yes, you can split your input string on ".", and then add the microseconds to the datetime you got.
Try this:
>>> def gt(dt_str):
dt, _, us= dt_str.partition(".")
dt= datetime.datetime.strptime(dt, "%Y-%m-%dT%H:%M:%S")
us= int(us.rstrip("Z"), 10)
return dt + datetime.timedelta(microseconds=us)
>>> gt("2008-08-12T12:20:30.656234Z")
datetime.datetime(2008, 8, 12, 12, 20, 30, 656234)
A simple option from one of the comments: replace 'Z'
with '+00:00'
- and use Python 3.7+'s fromisoformat
:
from datetime import datetime
s = "2008-09-03T20:56:35.450686Z"
datetime.fromisoformat(s.replace('Z', '+00:00'))
# datetime.datetime(2008, 9, 3, 20, 56, 35, 450686, tzinfo=datetime.timezone.utc)
Although strptime
can parse the 'Z'
character to UTC, fromisoformat
is faster by ~ x40 (see also: A faster strptime):
%timeit datetime.fromisoformat(s.replace('Z', '+00:00'))
346 ns ± 22.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f%z')
14.2 µs ± 452 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit dateutil.parser.parse(s)
80.1 µs ± 3.32 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
(Python 3.8.7 x64 on Windows 10)
In these days, Arrow also can be used as a third-party solution:
>>> import arrow
>>> date = arrow.get("2008-09-03T20:56:35.450686Z")
>>> date.datetime
datetime.datetime(2008, 9, 3, 20, 56, 35, 450686, tzinfo=tzutc())
I have found ciso8601 to be the fastest way to parse ISO 8601 timestamps. As the name suggests, it is implemented in C.
import ciso8601
ciso8601.parse_datetime('2014-01-09T21:48:00.921000+05:30')
The GitHub Repo README shows their >10x speedup versus all of the other libraries listed in the other answers.
My personal project involved a lot of ISO 8601 parsing. It was nice to be able to just switch the call and go 10x faster. :)
Edit: I have since become a maintainer of ciso8601. It's now faster than ever!
If you don't want to use dateutil, you can try this function:
def from_utc(utcTime,fmt="%Y-%m-%dT%H:%M:%S.%fZ"):
"""
Convert UTC time string to time.struct_time
"""
# change datetime.datetime to time, return time.struct_time type
return datetime.datetime.strptime(utcTime, fmt)
Test:
from_utc("2007-03-04T21:08:12.123Z")
Result:
datetime.datetime(2007, 3, 4, 21, 8, 12, 123000)
If you are working with Django, it provides the dateparse module that accepts a bunch of formats similar to ISO format, including the time zone.
If you are not using Django and you don't want to use one of the other libraries mentioned here, you could probably adapt the Django source code for dateparse to your project.
This works for stdlib on Python 3.2 onwards (assuming all the timestamps are UTC):
from datetime import datetime, timezone, timedelta
datetime.strptime(timestamp, "%Y-%m-%dT%H:%M:%S.%fZ").replace(
tzinfo=timezone(timedelta(0)))
For example,
>>> datetime.utcnow().replace(tzinfo=timezone(timedelta(0)))
... datetime.datetime(2015, 3, 11, 6, 2, 47, 879129, tzinfo=datetime.timezone.utc)
I've coded up a parser for the ISO 8601 standard and put it on GitHub: https://github.com/boxed/iso8601. This implementation supports everything in the specification except for durations, intervals, periodic intervals, and dates outside the supported date range of Python's datetime module.
Tests are included! :P
One straightforward way to convert an ISO 8601-like date string to a UNIX timestamp or datetime.datetime
object in all supported Python versions without installing third-party modules is to use the date parser of SQLite.
#!/usr/bin/env python
from __future__ import with_statement, division, print_function
import sqlite3
import datetime
testtimes = [
"2016-08-25T16:01:26.123456Z",
"2016-08-25T16:01:29",
]
db = sqlite3.connect(":memory:")
c = db.cursor()
for timestring in testtimes:
c.execute("SELECT strftime('%s', ?)", (timestring,))
converted = c.fetchone()[0]
print("%s is %s after epoch" % (timestring, converted))
dt = datetime.datetime.fromtimestamp(int(converted))
print("datetime is %s" % dt)
Output:
2016-08-25T16:01:26.123456Z is 1472140886 after epoch
datetime is 2016-08-25 12:01:26
2016-08-25T16:01:29 is 1472140889 after epoch
datetime is 2016-08-25 12:01:29
Django's parse_datetime() function supports dates with UTC offsets:
parse_datetime('2016-08-09T15:12:03.65478Z') =
datetime.datetime(2016, 8, 9, 15, 12, 3, 654780, tzinfo=<UTC>)
So it could be used for parsing ISO 8601 dates in fields within entire project:
from django.utils import formats
from django.forms.fields import DateTimeField
from django.utils.dateparse import parse_datetime
class DateTimeFieldFixed(DateTimeField):
def strptime(self, value, format):
if format == 'iso-8601':
return parse_datetime(value)
return super().strptime(value, format)
DateTimeField.strptime = DateTimeFieldFixed.strptime
formats.ISO_INPUT_FORMATS['DATETIME_INPUT_FORMATS'].insert(0, 'iso-8601')
Because ISO 8601 allows many variations of optional colons and dashes being present, basically CCYY-MM-DDThh:mm:ss[Z|(+|-)hh:mm]
. If you want to use strptime, you need to strip out those variations first.
The goal is to generate a utc datetime object.
2016-06-29T19:36:29.3453Z
datetime.datetime.strptime(timestamp.translate(None, ':-'), "%Y%m%dT%H%M%S.%fZ")
2016-06-29T19:36:29.3453-0400
2008-09-03T20:56:35.450686+05:00
20080903T205635.450686+0500
import re
# this regex removes all colons and all
# dashes EXCEPT for the dash indicating + or - utc offset for the timezone
conformed_timestamp = re.sub(r"[:]|([-](?!((\d{2}[:]\d{2})|(\d{4}))$))", '', timestamp)
datetime.datetime.strptime(conformed_timestamp, "%Y%m%dT%H%M%S.%f%z" )
%z
ValueError: 'z' is a bad directive in format '%Y%m%dT%H%M%S.%f%z'
Z
%z
import re
import datetime
# this regex removes all colons and all
# dashes EXCEPT for the dash indicating + or - utc offset for the timezone
conformed_timestamp = re.sub(r"[:]|([-](?!((\d{2}[:]\d{2})|(\d{4}))$))", '', timestamp)
# split on the offset to remove it. use a capture group to keep the delimiter
split_timestamp = re.split(r"[+|-]",conformed_timestamp)
main_timestamp = split_timestamp[0]
if len(split_timestamp) == 3:
sign = split_timestamp[1]
offset = split_timestamp[2]
else:
sign = None
offset = None
# generate the datetime object without the offset at UTC time
output_datetime = datetime.datetime.strptime(main_timestamp +"Z", "%Y%m%dT%H%M%S.%fZ" )
if offset:
# create timedelta based on offset
offset_delta = datetime.timedelta(hours=int(sign+offset[:-2]), minutes=int(sign+offset[-2:]))
# offset datetime with timedelta
output_datetime = output_datetime + offset_delta
An another way is to use specialized parser for ISO-8601 is to use isoparse function of dateutil parser:
from dateutil import parser
date = parser.isoparse("2008-09-03T20:56:35.450686+01:00")
print(date)
Output:
2008-09-03 20:56:35.450686+01:00
This function is also mentioned in the documentation for the standard Python function datetime.fromisoformat:
A more full-featured ISO 8601 parser, dateutil.parser.isoparse is available in the third-party package dateutil.
Nowadays there's Maya: Datetimes for Humans™, from the author of the popular Requests: HTTP for Humans™ package:
>>> import maya
>>> str = '2008-09-03T20:56:35.450686Z'
>>> maya.MayaDT.from_rfc3339(str).datetime()
datetime.datetime(2008, 9, 3, 20, 56, 35, 450686, tzinfo=<UTC>)
Thanks to great Mark Amery's answer I devised function to account for all possible ISO formats of datetime:
class FixedOffset(tzinfo):
"""Fixed offset in minutes: `time = utc_time + utc_offset`."""
def __init__(self, offset):
self.__offset = timedelta(minutes=offset)
hours, minutes = divmod(offset, 60)
#NOTE: the last part is to remind about deprecated POSIX GMT+h timezones
# that have the opposite sign in the name;
# the corresponding numeric value is not used e.g., no minutes
self.__name = '<%+03d%02d>%+d' % (hours, minutes, -hours)
def utcoffset(self, dt=None):
return self.__offset
def tzname(self, dt=None):
return self.__name
def dst(self, dt=None):
return timedelta(0)
def __repr__(self):
return 'FixedOffset(%d)' % (self.utcoffset().total_seconds() / 60)
def __getinitargs__(self):
return (self.__offset.total_seconds()/60,)
def parse_isoformat_datetime(isodatetime):
try:
return datetime.strptime(isodatetime, '%Y-%m-%dT%H:%M:%S.%f')
except ValueError:
pass
try:
return datetime.strptime(isodatetime, '%Y-%m-%dT%H:%M:%S')
except ValueError:
pass
pat = r'(.*?[+-]\d{2}):(\d{2})'
temp = re.sub(pat, r'\1\2', isodatetime)
naive_date_str = temp[:-5]
offset_str = temp[-5:]
naive_dt = datetime.strptime(naive_date_str, '%Y-%m-%dT%H:%M:%S.%f')
offset = int(offset_str[-4:-2])*60 + int(offset_str[-2:])
if offset_str[0] == "-":
offset = -offset
return naive_dt.replace(tzinfo=FixedOffset(offset))
def parseISO8601DateTime(datetimeStr):
import time
from datetime import datetime, timedelta
def log_date_string(when):
gmt = time.gmtime(when)
if time.daylight and gmt[8]:
tz = time.altzone
else:
tz = time.timezone
if tz > 0:
neg = 1
else:
neg = 0
tz = -tz
h, rem = divmod(tz, 3600)
m, rem = divmod(rem, 60)
if neg:
offset = '-%02d%02d' % (h, m)
else:
offset = '+%02d%02d' % (h, m)
return time.strftime('%d/%b/%Y:%H:%M:%S ', gmt) + offset
dt = datetime.strptime(datetimeStr, '%Y-%m-%dT%H:%M:%S.%fZ')
timestamp = dt.timestamp()
return dt + timedelta(hours=dt.hour-time.gmtime(timestamp).tm_hour)
Note that we should look if the string doesn't ends with Z
, we could parse using %z
.
Initially I tried with:
from operator import neg, pos
from time import strptime, mktime
from datetime import datetime, tzinfo, timedelta
class MyUTCOffsetTimezone(tzinfo):
@staticmethod
def with_offset(offset_no_signal, signal): # type: (str, str) -> MyUTCOffsetTimezone
return MyUTCOffsetTimezone((pos if signal == '+' else neg)(
(datetime.strptime(offset_no_signal, '%H:%M') - datetime(1900, 1, 1))
.total_seconds()))
def __init__(self, offset, name=None):
self.offset = timedelta(seconds=offset)
self.name = name or self.__class__.__name__
def utcoffset(self, dt):
return self.offset
def tzname(self, dt):
return self.name
def dst(self, dt):
return timedelta(0)
def to_datetime_tz(dt): # type: (str) -> datetime
fmt = '%Y-%m-%dT%H:%M:%S.%f'
if dt[-6] in frozenset(('+', '-')):
dt, sign, offset = strptime(dt[:-6], fmt), dt[-6], dt[-5:]
return datetime.fromtimestamp(mktime(dt),
tz=MyUTCOffsetTimezone.with_offset(offset, sign))
elif dt[-1] == 'Z':
return datetime.strptime(dt, fmt + 'Z')
return datetime.strptime(dt, fmt)
But that didn't work on negative timezones. This however I got working fine, in Python 3.7.3:
from datetime import datetime
def to_datetime_tz(dt): # type: (str) -> datetime
fmt = '%Y-%m-%dT%H:%M:%S.%f'
if dt[-6] in frozenset(('+', '-')):
return datetime.strptime(dt, fmt + '%z')
elif dt[-1] == 'Z':
return datetime.strptime(dt, fmt + 'Z')
return datetime.strptime(dt, fmt)
Some tests, note that the out only differs by precision of microseconds. Got to 6 digits of precision on my machine, but YMMV:
for dt_in, dt_out in (
('2019-03-11T08:00:00.000Z', '2019-03-11T08:00:00'),
('2019-03-11T08:00:00.000+11:00', '2019-03-11T08:00:00+11:00'),
('2019-03-11T08:00:00.000-11:00', '2019-03-11T08:00:00-11:00')
):
isoformat = to_datetime_tz(dt_in).isoformat()
assert isoformat == dt_out, '{} != {}'.format(isoformat, dt_out)