1
votes

I created a dataframe df5 :

df5 = pd.read_csv('C:/Users/Demonstrator/Downloads/Listeequipement.csv',delimiter=';', parse_dates=[0], infer_datetime_format = True)
df5['TIMESTAMP'] = pd.to_datetime(df5['TIMESTAMP'], '%d/%m/%y %H:%M')
df5['date'] = df5['TIMESTAMP'].dt.date
df5['time'] = df5['TIMESTAMP'].dt.time
date_debut = pd.to_datetime('2015-08-01 23:10:00')
date_fin = pd.to_datetime('2015-10-01 00:00:00')
df5 = df5[(df5['TIMESTAMP'] >= date_debut) & (df5['TIMESTAMP'] < date_fin)]
df5.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 8645 entries, 145 to 8789
Data columns (total 9 columns):
TIMESTAMP                 8645 non-null datetime64[ns]
ACT_TIME_AERATEUR_1_F1    8645 non-null float64
ACT_TIME_AERATEUR_1_F3    8645 non-null float64
ACT_TIME_AERATEUR_1_F5    8645 non-null float64
ACT_TIME_AERATEUR_1_F6    8645 non-null float64
ACT_TIME_AERATEUR_1_F7    8645 non-null float64
ACT_TIME_AERATEUR_1_F8    8645 non-null float64
date                      8645 non-null object
time                      8645 non-null object
dtypes: datetime64[ns](1), float64(6), object(2)
memory usage: 675.4+ KB

Then, I resampled it by day like this :

df5 = df5.set_index('TIMESTAMP')
df5 = df5.resample('1d').mean()
df5.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 61 entries, 2015-08-01 to 2015-09-30
Freq: D
Data columns (total 6 columns):
ACT_TIME_AERATEUR_1_F1    61 non-null float64
ACT_TIME_AERATEUR_1_F3    61 non-null float64
ACT_TIME_AERATEUR_1_F5    61 non-null float64
ACT_TIME_AERATEUR_1_F6    61 non-null float64
ACT_TIME_AERATEUR_1_F7    61 non-null float64
ACT_TIME_AERATEUR_1_F8    61 non-null float64
dtypes: float64(6)
memory usage: 3.3 KB

After, I try to assign for each timestamp a date, a time and a day of week like this :

df5['date'] = df5['TIMESTAMP'].dt.date
df5['time'] = df5['TIMESTAMP'].dt.time

df5['day_of_week'] = df5['date'].dt.dayofweek

days = {0:'Mon',1:'Tues',2:'Weds',3:'Thurs',4:'Fri',5:'Sat',6:'Sun'}

df5['day_of_week'] = df5['day_of_week'].apply(lambda x: days[x])

But As the Timestamp become an index of a dataframe when resampling, I get a problem :

KeyError                                  Traceback (most recent call last)
C:\Users\Demonstrator\Anaconda3\lib\site-packages\pandas\indexes\base.py

in get_loc(self, key, method, tolerance) 1944 try: -> 1945 return self._engine.get_loc(key) 1946 except KeyError:

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4154)()

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item

(pandas\hashtable.c:12368)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item

(pandas\hashtable.c:12322)()

KeyError: 'TIMESTAMP'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-164-9887c2fb7404> in <module>()
----> 1 df5['date'] = df5['TIMESTAMP'].dt.date
      2 df5['time'] = df5['TIMESTAMP'].dt.time
      3 
      4 df5['day_of_week'] = df5['date'].dt.dayofweek
      5 

C:\Users\Demonstrator\Anaconda3\lib\site-packages\pandas\core\frame.py

in getitem(self, key) 1995 return self._getitem_multilevel(key) 1996 else: -> 1997 return self._getitem_column(key) 1998 1999 def _getitem_column(self, key):

C:\Users\Demonstrator\Anaconda3\lib\site-packages\pandas\core\frame.py

in _getitem_column(self, key) 2002 # get column 2003 if self.columns.is_unique: -> 2004 return self._get_item_cache(key) 2005 2006 # duplicate columns & possible reduce dimensionality

C:\Users\Demonstrator\Anaconda3\lib\site-packages\pandas\core\generic.py

in _get_item_cache(self, item) 1348 res = cache.get(item) 1349 if res is None: -> 1350 values = self._data.get(item) 1351 res = self._box_item_values(item, values) 1352 cache[item] = res

C:\Users\Demonstrator\Anaconda3\lib\site-packages\pandas\core\internals.py

in get(self, item, fastpath) 3288 3289 if not isnull(item): -> 3290 loc = self.items.get_loc(item) 3291 else: 3292 indexer = np.arange(len(self.items))[isnull(self.items)]

C:\Users\Demonstrator\Anaconda3\lib\site-packages\pandas\indexes\base.py

in get_loc(self, key, method, tolerance) 1945 return self._engine.get_loc(key) 1946 except KeyError: -> 1947 return self._engine.get_loc(self._maybe_cast_indexer(key)) 1948 1949 indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4154)()

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item

(pandas\hashtable.c:12368)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item

(pandas\hashtable.c:12322)()

KeyError: 'TIMESTAMP'

Have you an idea please to resolve this problem? Thank you in advance

Kind regards

1
Once it becomes the index you need to access the index not a column: df5['date'] = df5.index.date df5['time'] = df5.index.time e.t.c. - EdChum
@EdChum thank you and for df5['day_of_week'] = df5['date'].dt.dayofweek ? - Poisson
When it's an index you dont need .dt - EdChum
Ok but when I did df5['day_of_week'] = df5['date'].dayofweek I get this error : 'Series' object has no attribute 'dayofweek' - Poisson
df['date'] is a column (not the index) so there you need .dt - Paul H

1 Answers

0
votes

You can keep the column in the dataframe even after you assign it as the index like this:

df5 = df5.set_index('TIMESTAMP', drop=False)