61
votes

A pandas DataFrame column duration contains timedelta64[ns] as shown. How can you convert them to seconds?

0   00:20:32
1   00:23:10
2   00:24:55
3   00:13:17
4   00:18:52
Name: duration, dtype: timedelta64[ns]

I tried the following

print df[:5]['duration'] / np.timedelta64(1, 's')

but got the error

Traceback (most recent call last):
  File "test.py", line 16, in <module>
    print df[0:5]['duration'] / np.timedelta64(1, 's')
  File "C:\Python27\lib\site-packages\pandas\core\series.py", line 130, in wrapper
    "addition and subtraction, but the operator [%s] was passed" % name)
TypeError: can only operate on a timedeltas for addition and subtraction, but the operator [__div__] was passed

Also tried

print df[:5]['duration'].astype('timedelta64[s]')

but received the error

Traceback (most recent call last):
  File "test.py", line 17, in <module>
    print df[:5]['duration'].astype('timedelta64[s]')
  File "C:\Python27\lib\site-packages\pandas\core\series.py", line 934, in astype
    values = com._astype_nansafe(self.values, dtype)
  File "C:\Python27\lib\site-packages\pandas\core\common.py", line 1653, in _astype_nansafe
    raise TypeError("cannot astype a timedelta from [%s] to [%s]" % (arr.dtype,dtype))
TypeError: cannot astype a timedelta from [timedelta64[ns]] to [timedelta64[s]]
5
this is fixed recent releases; as in the source code, if both sides are timedelta addition, subtraction and division are supported. that section of the code was changed back in 2013 - behzad.nouri
@behzad.nouri is right, I believe this was fixed in > 0.13.1 (maybe needs 0.14), but current is 0.15.0: pandas.pydata.org/pandas-docs/stable/… - Jeff
@behzad.nouri Thanks, updating to Pandas 0.15.0 and numpy 1.9.0 got it to work. - Nyxynyx

5 Answers

72
votes

This works properly in the current version of Pandas (version 0.14):

In [132]: df[:5]['duration'] / np.timedelta64(1, 's')
Out[132]: 
0    1232
1    1390
2    1495
3     797
4    1132
Name: duration, dtype: float64

Here is a workaround for older versions of Pandas/NumPy:

In [131]: df[:5]['duration'].values.view('<i8')/10**9
Out[131]: array([1232, 1390, 1495,  797, 1132], dtype=int64)

timedelta64 and datetime64 data are stored internally as 8-byte ints (dtype '<i8'). So the above views the timedelta64s as 8-byte ints and then does integer division to convert nanoseconds to seconds.

Note that you need NumPy version 1.7 or newer to work with datetime64/timedelta64s.

54
votes

Use the Series dt accessor to get access to the methods and attributes of a datetime (timedelta) series.

>>> s
0   -1 days +23:45:14.304000
1   -1 days +23:46:57.132000
2   -1 days +23:49:25.913000
3   -1 days +23:59:48.913000
4            00:00:00.820000
dtype: timedelta64[ns]
>>>
>>> s.dt.total_seconds()
0   -885.696
1   -782.868
2   -634.087
3    -11.087
4      0.820
dtype: float64

There are other Pandas Series Accessors for String, Categorical, and Sparse data types.

13
votes

Just realized it's an old thread, anyway leaving it here if wanderers like me clicks only on top 5 results on the search engine and ends up here.

Make sure that your types are correct.

  • If you want to convert datetime to seconds , just sum up seconds for each hour, minute and seconds of the datetime object if its for duration within one date.

      • hours - hours x 3600 = seconds
      • minutes - minutes x 60 = seconds
      • seconds - seconds

linear_df['duration'].dt.hour*3600 + linear_df['duration'].dt.minute*60 + linear_df['duration'].dt.second

  • If you want to convert timedelta to seconds use the one bellow.

linear_df[:5]['duration'].astype('timedelta64[s]')

I got it to work like this:

start_dt and end_dt columns are in this format:

import datetime

linear_df[:5]['start_dt']

0   1970-02-22 21:32:48.000
1   2016-12-30 17:47:33.216
2   2016-12-31 09:33:27.931
3   2016-12-31 09:52:53.486
4   2016-12-31 10:29:44.611
Name: start_dt, dtype: datetime64[ns]

Had my duration in timedelta64[ns] format, which was subtraction of start and end datetime values.

linear_df['duration'] = linear_df['end_dt'] - linear_df['start_dt']

Resulted duration column look like this

linear_df[:5]['duration']

0          0 days 00:00:14
1   2 days 17:44:50.558000
2   0 days 15:37:28.418000
3   0 days 18:45:45.727000
4   0 days 19:21:27.159000
Name: duration, dtype: timedelta64[ns]

Using pandas I had my duration seconds between two dates in float. Easier to compare or filter your duration afterwards.

linear_df[:5]['duration'].astype('timedelta64[s]')

0        14.0
1    236690.0
2     56248.0
3     67545.0
4     69687.0
Name: duration, dtype: float64

In my case if I want to get all duration which is more than 1 second.

Hope it helps.

7
votes

Use the 'total_seconds()' function :

df['durationSeconds'] = df['duration'].dt.total_seconds()
2
votes

We can simply use the pandas apply() function

def get_seconds(time_delta):
    return time_delta.seconds

def get_microseconds(time_delta):
    return time_delta.micro_seconds

time_delta_series = df['duration']

converted_series = time_delta_series.apply(get_seconds)
print(converted_series)