0
votes

I have a data files containing year, day of the year (DOY), hour and minutes as following:

                         BuoyID  Year  Hour  Min       DOY   POS_DOY     Lat     Lon     Ts

            0      300234065718160  2019     7    0  216.2920  216.2920  58.559 -23.914  14.61

            1      300234065718160  2019     9    0  216.3750  216.3750  58.563 -23.905  14.60

            2      300234065718160  2019    10    0  216.4170  216.4170  58.564 -23.903  14.60

            3      300234065718160  2019    11    0  216.4580  216.4580  58.563 -23.906  14.60

            4      300234065718160  2019    12    0  216.5000  216.5000  58.561 -23.910  14.60

In order to make my datetime, I used:

dt_raw = pd.to_datetime(df_buoy['Year'] * 1000 + df_buoy['DOY'], format='%Y%j')

# Convert to datetime
dt_buoy = [d.date() for d in dt_raw]
date = datetime.datetime.combine(dt_buoy[0], datetime.time(df_buoy.Hour[0], df_buoy.Min[0]))

My problem arises when the hours are not int, but float instead. For example:

                   BuoyID  Year   Hour  Min      DOY  POS_DOY       Lat       Lon      BP    Ts
          0    300234061876910  2014  23.33    0  226.972  226.972  71.93081 -141.0792  1016.9 -0.01
          1    300234061876910  2014  23.50    0  226.979  226.979  71.93020 -141.0826  1016.8  3.36
          2    300234061876910  2014  23.67    0  226.986  226.986  71.92968 -141.0856  1016.8  3.28
          3    300234061876910  2014  23.83    0  226.993  226.993  71.92934 -141.0876  1016.8  3.22
          4    300234061876910  2014   0.00    0  227.000  227.000  71.92904 -141.0894  1016.8  3.18   

What I tried to do was to convert the hours in str, get the first two indexes, thus obtaining the hour, and then subtract this from the 'Hour' and multiply by 60 to get minutes.

int_hour = [(int(str(i)[0:2])) for i in df_buoy.Hour]  
minutes = map(lambda x, y: (x - y)*60, df_buoy.Hour, int_hour)

But, of course, if you have '0.' as your hour, Python will complain:

ValueError: invalid literal for int() with base 10: '0.'  

My question is: does anyone know a simple way to convert year, DOY, hour (either int or *float) and minutes to datetime in a simple way?

1
use float first, then use int.greentec

1 Answers

1
votes

Use to_timedelta for convert hours columns and add to datetimes, working well with integers and floats:

df['d'] = (pd.to_datetime(df['Year'] * 1000 + df['DOY'], format='%Y%j') +
           pd.to_timedelta(df['Hour'], unit='h'))

print (df)
            BuoyID  Year  Hour  Min      DOY  POS_DOY     Lat     Lon     Ts  \
0  300234065718160  2019     7    0  216.292  216.292  58.559 -23.914  14.61   
1  300234065718160  2019     9    0  216.375  216.375  58.563 -23.905  14.60   
2  300234065718160  2019    10    0  216.417  216.417  58.564 -23.903  14.60   
3  300234065718160  2019    11    0  216.458  216.458  58.563 -23.906  14.60   
4  300234065718160  2019    12    0  216.500  216.500  58.561 -23.910  14.60   

                    d  
0 2019-08-04 07:00:00  
1 2019-08-04 09:00:00  
2 2019-08-04 10:00:00  
3 2019-08-04 11:00:00  
4 2019-08-04 12:00:00  

df['d'] = (pd.to_datetime(df['Year'] * 1000 + df['DOY'], format='%Y%j') +
           pd.to_timedelta(df['Hour'], unit='h'))

print (df)
            BuoyID  Year   Hour  Min      DOY  POS_DOY       Lat       Lon  \
0  300234061876910  2014  23.33    0  226.972  226.972  71.93081 -141.0792   
1  300234061876910  2014  23.50    0  226.979  226.979  71.93020 -141.0826   
2  300234061876910  2014  23.67    0  226.986  226.986  71.92968 -141.0856   
3  300234061876910  2014  23.83    0  226.993  226.993  71.92934 -141.0876   
4  300234061876910  2014   0.00    0  227.000  227.000  71.92904 -141.0894   

       BP    Ts                   d  
0  1016.9 -0.01 2014-08-14 23:19:48  
1  1016.8  3.36 2014-08-14 23:30:00  
2  1016.8  3.28 2014-08-14 23:40:12  
3  1016.8  3.22 2014-08-14 23:49:48  
4  1016.0   NaN 2014-08-15 00:00:00