55
votes

I come up with this

n=1;
curAvg = 0;
loop{
  curAvg = curAvg + (newNum - curAvg)/n;
  n++;
}

I think highlights of this way are:
- It avoids big numbers (and possible overflow if you would sum and then divide)
- you save one register (not need to store sum)

The trouble might be with summing error - but I assume that generally there shall be balanced numbers of round up and round down so the error shall not sum up dramatically.

Do you see any pitfalls in this solution? Have you any better proposal?

2
I don't understand your formula. For 1 2 and 3 next, you'd do curAvg = 1.5 + (3 - 1.5)/2 = 1.5 + 0.75 = 2.25, which would be wrong?IVlad
Your solution is mentioned there: new average = old average + (next data - old average) / next countDonald_W
@IVlad You forgot to increment the value of n. It should be 3 instead of 2.So the expression would be curAvg = 1.5+(3-1.5)/3=1.5+0.5 = 2, which is correct.Natesh Raina
It should be noted that the OP's algorithm is not a standard moving average, but an exponentially-weighted moving average. While an EMA might be just the ticket for many applications, the two behave quite differently under some circumstances (large step response) and implementers should be aware of the difference. See stackoverflow.com/questions/12636613/…Julia

2 Answers

32
votes

Your solution is essentially the "standard" optimal online solution for keeping a running track of average without storing big sums and also while running "online", i.e. you can just process one number at a time without going back to other numbers, and you only use a constant amount of extra memory. If you want a slightly optimized solution in terms of numerical accuracy, at the cost of being "online", then assuming your numbers are all non-negative, then sort your numbers first from smallest to largest and then process them in that order, the same way you do now. That way, if you get a bunch of numbers that are really small about equal and then you get one big number, you will be able to compute the average accurately without underflow, as opposed to if you processed the large number first.

3
votes

I have used this algorithm for many years. The loop is any kind of loop. Maybe it is individual web sessions or maybe true loop. The point is all you need to track is the current count (N) and the current average (avg). Each time a new value is received, apply this algorithm to update the average. This will compute the exact arithmetic average. It has the additional benefit that it is resistant to overflow. If you have a gazillion large numbers to average, summing them all up may overflow before you get to divide by N. This algorithm avoids that pitfall.

Variables that are stored during the computation of the average:
N = 0
avg = 0

For each new value: V
    N=N+1
    a = 1/N
    b = 1 - a
    avg = a * V + b * avg