1
votes

I have a dataframe of number of messages and the time period they were sent in (increments of 10 minutes). Here's a snippet:

+---------------------+--------------+
|      UnixTime       | Num messages |
+---------------------+--------------+
| 2019-02-01 00:00:00 |           54 |
| 2019-02-01 00:10:00 |           23 |
| 2019-02-01 00:20:00 |           36 |
+---------------------+--------------+

This dataframe has a year's worth of increment timestamps and the number of messages in that period. How can I model a mean/median week using the entire dataframe and as the day of week as a reference.

+-----------------+-------------------------+
| UnixTime (Mean) | Mean number of messages |
+-----------------+-------------------------+
| Friday 00:00:00 | 56.3                    |
| Friday 00:10:00 | 25.5                    |
| Friday 00:20:00 | 30.4                    |
+-----------------+-------------------------+

So the output dataframe should model an average week from Monday-Sunday for the year, with the number of messages averaged over the year for that time period and day.

I know I can get the day of the week by df["Day Of Week"] = df['UnixTime'].dt.day_name() but how can I model the average so that each Monday of the year between 09:00 - 09:10 is treated as the same group for example.

1
Look at grouby. You should be able to do something like: df.groupby('Day Of Week').mean(). - Collin Phillips
Not sure if I understand you correctly, but if you want to get the mean for each day of the week. Shouldn't the column Mean number of messages in your dataframe all have the same value? - Erfan
@Erfan So it would be the mean number of messages sent between 00:00 and 00:10 on a Friday and so on - Lobstw
So each Friday of the year between 00:00 - 00:10 is treated as the same group? - Erfan
@Erfan That's exactly it, yes - Lobstw

1 Answers

1
votes

We can have a new column "weekNum" to label each row as part of the Nth week of the year. We can do this with floor division on the numeric values of the days.

df["weekNum"] = int(df["UnixTime"].dt.day//7)

We can then groupby:

df.groupby("weekNum").mean()
df.groupby("Day of week").mean()

EDIT:

To get each 10 second interval now that I understand your question better...we can use .strftime and format our times to day name %A hour, minute, second %H%M%S and group by that.

df["dayTime"] = df.UnixTime.dt.strftime("%A%H%M%S")
df.groupby("dayTime").mean()