I'm trying to figure out how to take a dataframe representing players in a game, the dataframe has unique users and records of each day the particular user has been active.
I am trying to get the average playtime and average moves for each week in the various users lifetime.
(Week is defined by a user's first record, i.e. if a user's first record is 3rd of January, their 1st week starts then and the 2nd week start the 10th of January).
Example
userid date secondsPlayed movesMade
++/acsbP2NFC2BvgG1BzySv5jko= 2016-04-28 413.88188 85
++/acsbP2NFC2BvgG1BzySv5jko= 2016-05-01 82.67343 15
++/acsbP2NFC2BvgG1BzySv5jko= 2016-05-05 236.73809 39
++/acsbP2NFC2BvgG1BzySv5jko= 2016-05-10 112.69112 29
++/acsbP2NFC2BvgG1BzySv5jko= 2016-05-11 211.42790 44
-----------------------------------CONT----------------------------------
++/8ij1h8378h123123koF3oer1 2016-05-05 200.73809 11
++/8ij1h8378h123123koF3oer1 2016-05-10 51.69112 14
++/8ij1h8378h123123koF3oer1 2016-05-14 65.42790 53
The end result for this would be the following table:
userid date secondsPlayed_w movesMade_w
++/acsbP2NFC2BvgG1BzySv5jko= 2016-04-28 496.55531 100
++/acsbP2NFC2BvgG1BzySv5jko= 2016-05-05 236.73809 68
-----------------------------------CONT----------------------------------
++/8ij1h8378h123123koF3oer1 2016-05-05 252.42921 25
++/8ij1h8378h123123koF3oer1 2016-05-12 65.42790 53
Failed attempt #1:
So far I've tried doing a lot of different things, but the most useful dataframe I've managed to create was the following:
df_grouped = df.groupby('userid').apply(lambda x: x.set_index('date').resample('1D').first().fillna(0))
df_result = df_grouped.groupby(level=0)['secondsPlayed'].apply(lambda x: x.rolling(min_periods=1, window=7).mean()).reset_index(name='secondsPlayed_week')
Which is a very slow and wasteful computation, but nonetheless can be used as a intermediate step.
userid date secondsPlayed_w
++/acsbP2NFC2BvgG1BzySv5jko= 2016-04-28 4.138819e+02
++/acsbP2NFC2BvgG1BzySv5jko= 2016-04-29 2.069409e+02
++/acsbP2NFC2BvgG1BzySv5jko= 2016-04-30 1.379606e+02
++/acsbP2NFC2BvgG1BzySv5jko= 2016-05-01 1.241388e+02
++/acsbP2NFC2BvgG1BzySv5jko= 2016-05-02 9.931106e+01
++/acsbP2NFC2BvgG1BzySv5jko= 2016-05-03 8.275922e+01
++/acsbP2NFC2BvgG1BzySv5jko= 2016-05-04 7.093647e+01
++/acsbP2NFC2BvgG1BzySv5jko= 2016-05-05 4.563022e+01
Failed attempt #2:
df_result = (df .reset_index() .set_index("date") .groupby(pd.Grouper(freq='W'))).agg({"userid":"first", "secondsPlayed":"sum", "movesUsed":"sum"}) .reset_index()
Which gave me the following dataframe, which has the fault of not being grouped by userids (the NaN problem is easily resolved).
date userid secondsPlayed_w movesMade_w
2016-04-10 +1kexX0Yk2Su639WaRKARcwjq5g= 2.581356e+03 320
2016-04-17 +1kexX0Yk2Su639WaRKARcwjq5g= 4.040738e+03 615
2016-04-24 NaN 0.000000e+00 0
2016-05-01 ++RBPf9KdTK6pTN+lKZHDLCXg10= 1.644130e+05 17453
2016-05-08 ++DndI7do036eqYh9iW7vekAnx0= 3.775905e+05 31997
2016-05-15 ++NjKpr/vyxNCiYcmeFK9qSqD9o= 4.993430e+05 34706
2016-05-22 ++RBPf9KdTK6pTN+lKZHDLCXg10= 3.940408e+05 23779
Immediate thought:
Can this problem be solved by using a groupby that groups by two columns. But I'm not at all sure how to go about that with this particular problem.