I want to calculate running sum from last one hour for each transaction using Spark-Scala. I have following dataframe with three fields and want to calculate fourth field as given below:
Customer TimeStamp Tr Last_1Hr_RunningSum
Cust-1 6/1/2015 6:51:55 1 1
Cust-1 6/1/2015 6:58:34 3 4
Cust-1 6/1/2015 7:20:46 3 7
Cust-1 6/1/2015 7:40:45 4 11
Cust-1 6/1/2015 7:55:34 5 15
Cust-1 6/1/2015 8:20:34 0 12
Cust-1 6/1/2015 8:34:34 3 12
Cust-1 6/1/2015 9:35:34 7 7
Cust-1 6/1/2015 9:45:34 3 10
Cust-2 6/1/2015 16:26:34 2 2
Cust-2 6/1/2015 16:35:34 1 3
Cust-2 6/1/2015 17:39:34 3 3
Cust-2 6/1/2015 17:43:34 5 8
Cust-3 6/1/2015 17:17:34 6 6
Cust-3 6/1/2015 17:21:34 4 10
Cust-3 6/1/2015 17:45:34 2 12
Cust-3 6/1/2015 17:56:34 3 15
Cust-3 6/1/2015 18:21:34 4 13
Cust-3 6/1/2015 19:24:34 1 1
I want to calculate "Last_1Hr_RunningSum" as new field which look back for one hour from each transaction by customer id and take some of "Tr"(Transaction filed).
- For example :Cust-1 at 6/1/2015 8:20:34 will look back till 6/1/2015 7:20:46 and take sum of (0+5+4+3) = 12.
- Same way for each row I want to look back for one hour and take sum of all Transaction during that one hour.
I tried running sqlContext.sql with nested query but its giving me error. Also Window function and Row Number over partition is not supported by Spark-Scala SQLContext.
How can I get the sum of last one hour from "Tr" using column 'TimeStamp' with Spark-Scala only.
Thanks in advance.