I have a dataframe of this format:
Date | Return
01/01/2015 0.0
02/02/2015 0.04
03/02/2015 0.06
04/02/2015 0.16
I need to calculate cumulative standard deviation for each of the rows and also select the number of rows above it. So my result will look somewhat like this:
Date | Rows above | Compounded
01/01/2015 0 0(First element to be kept zero)
02/02/2015 1 0.02828427125(Std_Dev of 0,0.04)
03/02/2015 2 0.03055050463(Std_Dev of 0,0.04,0.06)
04/02/2015 3 0.06806859286(Std_Dev of 0,0.04,0.06,0.16)
I am new to SparkSQL and specially new to window functions. So answers in Java will be highly appreciable. Thanks.