2
votes

I'm fairly to Matlab / Octave and Machine learning, but so far I've learned you want to avoid iterative loops for summation and vectorize as much as possible.

Given a row vector like: x = [ 1,2,3,4,5]

You can calculate the sum with these two methods:

  • sum(x)
  • x * ones(length(x),1)

While my gut tells me to us the built in functions, the second options feels more 'vectorized'.

Which of the is more optimal and why? Are there tradeoffs between the two in performance vs. memory use, etc...?

2

2 Answers

3
votes

In general, it seems like sum is better: the time/memory overhead of allocating the "all ones" vector does not worth it.
However, when one needs to repeatedly sum over vectors of the same length, the allocation of the vector can be done only once, thus reducing the average overhead.

On my machine:

"Caching" the all ones vector:

N=100;T=500000; x=rand(T,N);
tic;o=ones(N,1);for ii=1:T, x(ii,:)*o;end;toc

Elapsed time is 0.480388 seconds.

While using sum:

tic;for ii=1:T, sum(x(ii,:));end;toc

Elapsed time is 0.488517 seconds.

So, it's slightly faster to use the "all ones" vector method in case of repeated sums.


If you take the allocation of the "all ones" vector out of the time calculation, you end up with:

N=100;T=500000; x=rand(T,N);o=ones(N,1);
tic;for ii=1:T, x(ii,:)*o;end;toc 

Elapsed time is 0.477762 seconds.

But again, you'll have to allocate it at some point...

1
votes

Ok, did some more digging:

From a performance standpoint the built in sum() is much better:

  x = rand(1,100000000); 
  %slowwwww
  t = cputime; x * ones(length(x),1); e= cputime - t; e

  % Faster
  t = cputime; sum(x); e= cputime - t; e

I guessing using an extra vector of ones is also needless memory use. Since there is no performance gain over sum() the non-native method is far less optimal.