2
votes

I am investigating how many FLOPs could be done in one CPU cycles using gotoblas library. I used 32-bit floating point number to run a matrix multiplication, and got roughly 8 FLOPs per CPU cycle by hand calculation. I guess this may be because there are two FPUs in my processor (Intel Xeon E5430), each of which takes care of one SSE instruction over 128-bit XMM registers. Therefore, using 32-bit floating point numbers, I got 2*4 FLOPs per CPU cycle.

Is my guess correct? Is there an official manual I can refer to get the number of FPUs in one Intel processor?

Thanks!

1
Thanks! I tried to google, but didn't get what I want yet.Jose

1 Answers

1
votes

I think I found out the reason. Theoretically Intel Xeon E5430 can do 4-wide SSE addition + 4-wide SSE multiplication together in one CPU cycle for single precision floating point numbers.