8
votes

I'm trying to find a little bit more information for efficient square root algorithms which are most likely implemented on FPGA. A lot of algorithms are found already but which one are for example from Intel or AMD? By efficient I mean they are either really fast or they don't need much memory.

EDIT: I should probably mention that the question is generally a floating point number and since most of the hardware implements the IEEE 754 standard where the number is represented as: 1 sign bit, 8 bits biased exponent and 23 bits mantissa.

Thanks!

2
Why not implement this? You only do shifts and adds and no extra memory is needed for things like look-up tables. Looks like a good candidate for an FPGA. - Alexey Frunze
Thanks for the comment @Alex. I'll try to find some more resources, because I still don't how can I implement that in VHDL. One more question, doesn't that find just the integer part of sqrt? - Dimitar Petrov
Do you want to solve one square root as fast as possible, or solve a continuous stream of square roots as fast as possible? - President James K. Polk
@DimitarPetrov: right, that particular piece of code calculates the integer square root of an integer. But you can reuse it for floating-point values too because sqrt(mantissa*2^exponent)=sqrt(mantissa)*2^(exponent/2) and you can always represent your number as an integer mantissa times some even power of 2. You should have included the details about floating-point square root and VHDL in the question. Actually, the VHDL probably deserves a separate question. - Alexey Frunze

2 Answers

6
votes

Not a full solution, but a couple of pointers.

I assume you're working in floating point, so point 1 is remember that floating point is stored as a mantissa and exponent. The exponent of the square root will be approximately half the exponent of the original number thanks to logarithms.

Then the mantissa can be approximated with a look-up table, and then you can use a couple of newton-raphson rounds to give some accuracy to the result from the LUT.

I haven't implemented anything like this for about 8 years, but I think this is how I did it and was able to get a result in 3 or 4 cycles.

3
votes

This is a great one for fast inverse-quare root.
Have a look at it here. Notice it's pretty much about the initial guess, rather amazing document :)