1
votes

I have a pandas dataframe that contains the following columns:

col1 col2
20040929 NaN
NaN 20040925 

that is both both col1 and col2 are float64 (or int64) numbers. I am trying to convert these using datetime.strptime() but I get the error

"cannot convert the series to type 'float'"

and if I convert them to float, I get something like 20040929.0 which strptime does not understand.

How can I transform these columns into date then? Many thanks

1
You will need to use str() to convert the float to a string before passing it to datetime.strptime() - gtlambert
yes I tried but when I convert them to str it converts the date to "20040929.0"... - ℕʘʘḆḽḘ

1 Answers

3
votes

you can convert the df to str using astype and then apply to_datetime with format string:

In [190]:
df.astype(str).apply(lambda x: pd.to_datetime(x, format='%Y%m%d'))

Out[190]:
        col1       col2
0 2004-09-29        NaT
1        NaT 2004-09-25

EDIT

using strptime will be slower and less friendly, firstly converting to str introduces .0 as the dtype is float, we have to split on this, additionally strptime doesn't understand Series so we have to call applymap. On top of this NaN will cause strptime to bork so we have to do the following:

In [203]:
def func(x):
    try:
        return dt.datetime.strptime(x.split('.')[0], '%Y%m%d')
    except:
        return pd.NaT
df.astype(str).applymap(func)

Out[203]:
        col1       col2
0 2004-09-29        NaT
1        NaT 2004-09-25

Timings

If we compare the 2 methods on a 2K row df:

In [212]:
%timeit df.astype(str).apply(lambda x: pd.to_datetime(x, format='%Y%m%d'))
100 loops, best of 3: 8.11 ms per loop

In [213]:    
%%timeit 
def func(x):
    try:
        return dt.datetime.strptime(x.split('.')[0], '%Y%m%d')
    except:
        return pd.NaT
df.astype(str).applymap(func)

10 loops, best of 3: 86.3 ms per loop

We observe that the pandas method is over 10X faster, it's likely that it scales much better as it's vectorised