I am trying to create an efficient algorithm that can multiply matrices of large values that are of double precision. I've created the algorithm and tested it on small matrices first; after trying out i.e. A{4096x4096}, B{4096x4096} the loop takes forever to end; for those two matrices for example to produce AB took my computer more than 30 minutes to complete.
My computer is not an old slouch...it's a six-core i7 and I guess for a desktop workstation it is not that bad. On small matrices of sizes up to 1024x1024 it's completes relatively quickly, that is under 30-40 seconds and for 2048x2048 around 5 minutes... for 16384x16384 it didn't finish in 15 minutes and I stopped execution...
Am I doing something wrong or this is to be expected? :)
Thanks in advance!
The code is the following:
/* calculate */
for(travx = 0; travx < m; travx++) {
for(travy = 0; travy < n; travy++) {
/* we only need to calculate it ourside of Z loop */
tIndex = (travy)+(travx*n);
for(travz = 0; travz < p; travz++)
{
if(n==1)
{bIndex = ((n-1)*travy)+travz;
aIndex = ((p)*travx)+travz;}
else
{bIndex = ((n)*travz)+travy;
aIndex = ((p)*travx)+travz;}
temp = atab_ptr[aIndex]*btab_ptr[bIndex];
outtab_ptr[tIndex] = outtab_ptr[tIndex] + temp;
}
}
}
it's really straightforward... and gives great results on small matrices... no idea how you can multiply doubles under 10 secs on especially on p4... sounds kinda fishy... especially if you take in account the O(3) complexity of the problem.
Updates...based on feedback I've tweaked the code and... Well mainly I did a simplification of it and small matrix complete much faster, that is 1024x1024 is done within 3 seconds, but 4096x4096 is done in 6 minutes... the revised code is this:
for(travx = 0; travx < m; travx++) {
for(travy = 0; travy < n; travy++) {
for(travz = 0; travz < p; travz++)
{outtab_ptr[travy+travx*n] = outtab_ptr[travy+travx*n] + atab_ptr[travy+p*travz] * btab_ptr[travz+travx*p];}
}
}