0
votes

i got a day by day time-series data scratch from mysql, and i want to turn it into weekly time-series data. How could i make it?

For example here's a dataset of a product which has keys Timestamp & Price, containing 14 days of dates and corresponding prices.

Product: {'Timestamp': [datetime.date(2019, 4, 15), datetime.date(2019, 4, 16), datetime.date(2019, 4, 17), datetime.date(2019, 4, 18), datetime.date(2019, 4, 19), datetime.date(2019, 4, 20), datetime.date(2019, 4, 21), datetime.date(2019, 4, 22), datetime.date(2019, 4, 23), datetime.date(2019, 4, 24), datetime.date(2019, 4, 25), datetime.date(2019, 4, 26), datetime.date(2019, 4, 27), datetime.date(2019, 4, 28)]
, 'Price': ['3988', '3988', '3988', '3988', '3988', '3988', '3988', '3988', '3988', '3988', '3988', '3988', '3988', '3988']}

Now, instead of simply having corresponding dates and prices, I want to have my weekly average data begin on Saturday and end on Friday (here I use the examples of Friday, 4/19, and Friday, 4/26, to represent the weekly averages, using data from the 6 days prior to each of these dates; note that these dates do not merely represent a single day's price) so the outcome should be like this:

Product: {'Timestamp': [ datetime.date(2019, 4, 19), datetime.date(2019, 4, 26)]
, 'Price': ['3988', '3988']}
2

2 Answers

1
votes

You can easily find the average of a week starting from Friday if you convert your time series data to pd.to_datetime()

df = pd.DataFrame({'Timestamp': [datetime.date(2019, 4, 15), datetime.date(2019, 4, 16), datetime.date(2019, 4, 17), datetime.date(2019, 4, 18), datetime.date(2019, 4, 19), datetime.date(2019, 4, 20), datetime.date(2019, 4, 21), datetime.date(2019, 4, 22), datetime.date(2019, 4, 23), datetime.date(2019, 4, 24), datetime.date(2019, 4, 25), datetime.date(2019, 4, 26), datetime.date(2019, 4, 27), datetime.date(2019, 4, 28)], 'Price': ['3988', '3988', '3988', '3988', '3988', '3988', '3988', '3988', '3988', '3988', '3988', '3988', '3988', '3988']})
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['Price'] = df['Price'].astype('int')

then use list comprehension to create a new series and then cumulative sum the data

df['Week'] = [1 if x == 4 else 0 for x in df['Timestamp'].dt.weekday]
df['Week'] = df['Week'].cumsum()

Finally groupby on the newly created series to get the result

df[['Price', 'Week']].groupby('Week').mean()

Hope this helps :)

UPDATE

You can just do the aggregate function on the Timestamp variable with min and max and you will get the dates as well.

df[['Timestamp', 'Week']].groupby(['Week']).agg({'Timestamp':[np.min,np.max]})
0
votes

By following the steps @saahil provided i can almost get what i want. there is one information i need to have in my outcome is the Timestamp corresponding to the week.However there is my solution, instead of Group by i choice pivot to help me with this.

df2 = pd.pivot_table(df,index=['Week'],values=['Price','Timestamp'],aggfunc={'Price':np.mean,'Timestamp': max})