Your reasoning does not work for additions: if a
and b
are already inaccurate by 0.5 ULP each and a
is close to -b
, then the relative accuracy of a + b
can be terrible, much worse than 1.5 ULP. In fact, without further information about the vectors you are computing the dot-product of, there are no guarantees one can provide about the relative accuracy of the result.
Your line of reasoning is okay-ish when there are only multiplications, but it ignores compounded errors.
Consider the equation: (a + ea)(b + eb) = ab + aeb + bea + eaeb.
If you assume that both a and b are between 1 and 2, the total relative error after multiplication of two results that were already accurate to 0.5 ULP each can only, in a rough first approximation, be estimated as 1 ULP, and that is still ignoring the error term eaeb and the error of the multiplication itself. Make it about 1.5 ULP total relative error for the result of the floating-point multiplication, and this is only a rough average, not a sound maximum.
These colleagues of mine have formalized and demonstrated a notion of accuracy of the double-precision floating-point dot-product. A translation to English of their result is that if each vector component is bounded by 1.0
, then the end result of the dot product is accurate to NMAX * B, where NMAX is the dimension of the vectors, and B is a constant depending on NMAX. A few values are provided on the linked page:
NMAX 10 100 1000
B 0x1.1p-50 0x1.02p-47 0x1.004p-44
In their result, you can substitute 1.0
with any power of two P low enough to ensure the absence of overflow, and the absolute error of the dot product then becomes NMAX * B * P2. The loop invariants become respectively:
@ loop invariant \abs(exact_scalar_product(x,y,i)) <= i * P * P;
@ loop invariant \abs(p - exact_scalar_product(x,y,i)) <= i * B * P * P;