Computing floor(sqrt(x))
Exactly
Here is my solution, which is based on the bit-guessing approach proposed on Wikipedia. Unfortunately the pseudo-code provided on Wikipedia has some errors so I had to make some adjustments:
unsigned char bit_width(unsigned long long x) {
return x == 0 ? 1 : 64 - __builtin_clzll(x);
}
// implementation for all unsigned integer types
unsigned sqrt(const unsigned n) {
unsigned char shift = bit_width(n);
shift += shift & 1; // round up to next multiple of 2
unsigned result = 0;
do {
shift -= 2;
result <<= 1; // leftshift the result to make the next guess
result |= 1; // guess that the next bit is 1
result ^= result * result > (n >> shift); // revert if guess too high
} while (shift != 0);
return result;
}
bit_width
can be evaluated in constant time and the loop will iterate at most ceil(bit_width / 2)
times. So even for a 64-bit integer, this will be at worst 32 iterations of basic arithmetic and bitwise operations.
Unlike all other answers proposed so far, this actually gives you the best possible approximation, which is floor(sqrt(x))
. For any x2, this will return x exactly.
Making a Guess using log2(x)
If this is still too slow for you, you can make a guess solely based on the binary logarithm. The basic idea is that we can compute the sqrt
of any number 2x using 2x/2. x/2
might have a remainder, so we can't always compute this exactly, but we can compute an upper and lower bound.
For example:
- we are given 25
- compute
floor(log_2(25)) = 4
- compute
ceil(log_2(25)) = 5
- lower bound:
pow(2, floor(4 / 2)) = 4
- upper bound:
pow(2, ceil(5 / 2)) = 8
And indeed, the actual sqrt(25) = 5
. We found sqrt(16) >= 4
and sqrt(32) <= 8
. This means:
4 <= sqrt(16) <= sqrt(25) <= sqrt(32) <= 8
4 <= sqrt(25) <= 8
This is how we can implement these guesses, which we will call sqrt_lo
and sqrt_hi
.
// this function computes a lower bound
unsigned sqrt_lo(const unsigned n) noexcept
{
unsigned log2floor = bit_width(n) - 1;
return (unsigned) (n != 0) << (log2floor >> 1);
}
// this function computes an upper bound
unsigned sqrt_hi(const unsigned n)
{
bool isnt_pow2 = ((n - 1) & n) != 0; // check if n is a power of 2
unsigned log2ceil = bit_width(n) - 1 + isnt_pow2;
log2ceil += log2ceil & 1; // round up to multiple of 2
return (unsigned) (n != 0) << (log2ceil >> 1);
}
For these two functions, the following statement is always true:
sqrt_lo(x) <= floor(sqrt(x)) <= sqrt(x) <= sqrt_hi(x) <= x
Note that if we assume that the input is never zero, then (unsigned) (n != 0)
can be simplified to 1
and the statement is still true.
These functions can be evaluated in O(1)
with a hardware-__builtin_clzll
instruction. They only give precise results for numbers 22x, so 256
, 64
, 16
, etc.
i * i <= x
, that's an extra multiplication every iteration. – user2357112 supports Monicasqrt(x)
to a variable (of course) and the loop for up to 10000000 (when the innermost loop is empty) takes 4 seconds as opposed to the constant re-evaluation ofi*i
, which takes over two minutes. – Morten