An efficient algorithm to calculate the integer square root (isqrt) of arbitrarily large integers

Question

Notice

For a solution in Erlang or C / C++, go to Trial 4 below.

Wikipedia Articles

Integer square root

The definition of "integer square root" could be found here

Methods of computing square roots

An algorithm that does "bit magic" could be found here

[ Trial 1 : Using Library Function ]

Code

isqrt(N) when erlang:is_integer(N), N >= 0 ->
    erlang:trunc(math:sqrt(N)).

Problem

This implementation uses the sqrt() function from the C library, so it does not work with arbitrarily large integers (Note that the returned result does not match the input. The correct answer should be 12345678901234567890):

Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V5.10.4  (abort with ^G)
1> erlang:trunc(math:sqrt(12345678901234567890 * 12345678901234567890)).
12345678901234567168
2>

[ Trial 2 : Using Bigint `+` Only ]

Code

isqrt2(N) when erlang:is_integer(N), N >= 0 ->
    isqrt2(N, 0, 3, 0).

isqrt2(N, I, _, Result) when I >= N ->
    Result;

isqrt2(N, I, Times, Result) ->
    isqrt2(N, I + Times, Times + 2, Result + 1).

Description

This implementation is based on the following observation:

isqrt(0) = 0   # <--- One 0
isqrt(1) = 1   # <-+
isqrt(2) = 1   #   |- Three 1's
isqrt(3) = 1   # <-+
isqrt(4) = 2   # <-+
isqrt(5) = 2   #   |
isqrt(6) = 2   #   |- Five 2's
isqrt(7) = 2   #   |
isqrt(8) = 2   # <-+
isqrt(9) = 3   # <-+
isqrt(10) = 3  #   |
isqrt(11) = 3  #   |
isqrt(12) = 3  #   |- Seven 3's
isqrt(13) = 3  #   |
isqrt(14) = 3  #   |
isqrt(15) = 3  # <-+
isqrt(16) = 4  # <--- Nine 4's
...

Problem

This implementation involves only bigint additions so I expected it to run fast. However, when I fed it with 1111111111111111111111111111111111111111 * 1111111111111111111111111111111111111111, it seems to run forever on my (very fast) machine.

[ Trial 3 : Using Binary Search with Bigint `+1`, `-1` and `div 2` Only ]

Code

Variant 1 (My original implementation)

isqrt3(N) when erlang:is_integer(N), N >= 0 ->
    isqrt3(N, 1, N).

isqrt3(_N, Low, High) when High =:= Low + 1 ->
    Low;

isqrt3(N, Low, High) ->
    Mid = (Low + High) div 2,
    MidSqr = Mid * Mid,
    if
        %% This also catches N = 0 or 1
        MidSqr =:= N ->
            Mid;
        MidSqr < N ->
            isqrt3(N, Mid, High);
        MidSqr > N ->
            isqrt3(N, Low, Mid)
    end.

Variant 2 (modified above code so that the boundaries go with Mid+1 or Mid-1 instead, with reference to the answer by Vikram Bhat)

isqrt3a(N) when erlang:is_integer(N), N >= 0 ->
    isqrt3a(N, 1, N).

isqrt3a(N, Low, High) when Low >= High ->
    HighSqr = High * High,
    if
        HighSqr > N ->
            High - 1;
        HighSqr =< N ->
            High
    end;

isqrt3a(N, Low, High) ->
    Mid = (Low + High) div 2,
    MidSqr = Mid * Mid,
    if
        %% This also catches N = 0 or 1
        MidSqr =:= N ->
            Mid;
        MidSqr < N ->
            isqrt3a(N, Mid + 1, High);
        MidSqr > N ->
            isqrt3a(N, Low, Mid - 1)
    end.

Problem

Now it solves the 79-digit number (namely 1111111111111111111111111111111111111111 * 1111111111111111111111111111111111111111) in lightening speed, the result is shown immediately. However, it takes 60 seconds (+- 2 seconds) on my machine to solve one million (1,000,000) 61-digit numbers (namely, from 1000000000000000000000000000000000000000000000000000000000000 to 1000000000000000000000000000000000000000000000000000001000000). I would like to do it even faster.

[ Trial 4 : Using Newton's Method with Bigint `+` and `div` Only ]

Code

isqrt4(0) -> 0;

isqrt4(N) when erlang:is_integer(N), N >= 0 ->
    isqrt4(N, N).

isqrt4(N, Xk) ->
    Xk1 = (Xk + N div Xk) div 2,
    if
        Xk1 >= Xk ->
            Xk;
        Xk1 < Xk ->
            isqrt4(N, Xk1)
    end.

Code in C / C++ (for your interest)

Recursive variant

#include <stdint.h>

uint32_t isqrt_impl(
    uint64_t const n,
    uint64_t const xk)
{
    uint64_t const xk1 = (xk + n / xk) / 2;
    return (xk1 >= xk) ? xk : isqrt_impl(n, xk1);
}

uint32_t isqrt(uint64_t const n)
{
    if (n == 0) return 0;
    if (n == 18446744073709551615ULL) return 4294967295U;
    return isqrt_impl(n, n);
}

Iterative variant

#include <stdint.h>

uint32_t isqrt_iterative(uint64_t const n)
{
    uint64_t xk = n;
    if (n == 0) return 0;
    if (n == 18446744073709551615ULL) return 4294967295U;
    do
    {
        uint64_t const xk1 = (xk + n / xk) / 2;
        if (xk1 >= xk)
        {
            return xk;
        }
        else
        {
            xk = xk1;
        }
    } while (1);
}

Problem

The Erlang code solves one million (1,000,000) 61-digit numbers in 40 seconds (+- 1 second) on my machine, so this is faster than Trial 3. Can it go even faster?

About My Machine

Processor : 3.4 GHz Intel Core i7

Memory : 32 GB 1600 MHz DDR3

OS : Mac OS X Version 10.9.1

Related Questions

Integer square root in python

The answer by user448810 uses "Newton's Method". I'm not sure whether doing the division using "integer division" is okay or not. I'll try this later as an update. [UPDATE (2015-01-11): It is okay to do so]
The answer by math involves using a 3rd party Python package gmpy, which is not very favourable to me, since I'm primarily interested in solving it in Erlang with only builtin facilities.
The answer by DSM seems interesting. I don't really understand what is going on, but it seems that "bit magic" is involved there, and so it's not quite suitable for me too.

Infinite Recursion in Meta Integer Square Root

This question is for C++, and the algorithm by AraK (the questioner) looks like it's from the same idea as Trial 2 above.

// is integer division in Python, so Newton's method on x^2 - n = 0 works. — Blender
@Blender : I know what // means in Python, and therefore I doubt whether it works, since the algorithm published in Wikipedia (and anything goes with "Newton's method") works on real number (notice the footnote on floating point). — Siu Ching Pong -Asuka Kenji-
Using integer division doesn't make any difference. The sequence {x_n} is decreasing with a greatest lower bound of isqrt(n). — Blender
@Blender : Applying the algorithm from Wikipedia directly and substituting division by integer division does not lead to correct results. Since Wikipedia's terminating condition is abs(x[k+1] - x[k]) < 1 and since we're doing integer arithmetic, x[k+1] must be equal to x[k] in order to fulfil the condition. Therefore, when the algorithm applies to the input 3: x0 = 3, x1 = (3 + 3 div 3) div 2 = 2, x2 = (2 + 3 div 2) div 2 = 1, x3 = (1 + 3 div 1) div 2 = 2, which causes an infinite loop. To correct this, terminate when x[k+1] >= x[k] (because of the reason you mentioned). — Siu Ching Pong -Asuka Kenji-
Ah, you're right. I was looking at the code in that first answer, which uses x_{n_1} >= x_n as the stopping condition. I think the Wikipedia example could be updated to account for both integer and floating point division. — Blender

Vikram Bhat Vikram Bhat · Accepted Answer · 2014-02-09T12:22:51

How about binary search like following doesn't need floating divisions only integer multiplications (Slower than newtons method) :-

low = 1;

/* More efficient bound

high = pow(10,log10(target)/2+1);

*/


high = target


while(low<high) {

 mid = (low+high)/2;
 currsq = mid*mid;

 if(currsq==target) {
    return(mid);
 }

 if(currsq<target) {

      if((mid+1)*(mid+1)>target) {
             return(mid);
      }    
      low =  mid+1;
  }

 else {

     high = mid-1;
 }

}

This works for O(logN) iterations so should not run forever for even very large numbers

Log10(target) Computation if needed :-

acc = target

log10 = 0;

while(acc>0) {

  log10 = log10 + 1;
  acc = acc/10;
}

Note : acc/10 is integer division

Edit :-

Efficient bound :- The sqrt(n) has about half the number of digits as n so you can pass high = 10^(log10(N)/2+1) && low = 10^(log10(N)/2-1) to get tighter bound and it should provide 2 times speed up.

Evaluate bound:-

bound = 1;
acc = N;
count = 0;
while(acc>0) {

 acc = acc/10;

 if(count%2==0) {

    bound = bound*10;
 }

 count++;

}

high = bound*10;
low = bound/10;
isqrt(N,low,high);

An efficient algorithm to calculate the integer square root (isqrt) of arbitrarily large integers

Notice

Wikipedia Articles

[ Trial 1 : Using Library Function ]

Code

Problem

[ Trial 2 : Using Bigint + Only ]

Code

Description

Problem

[ Trial 3 : Using Binary Search with Bigint +1, -1 and div 2 Only ]

Code

Variant 1 (My original implementation)

Variant 2 (modified above code so that the boundaries go with Mid+1 or Mid-1 instead, with reference to the answer by Vikram Bhat)

Problem

[ Trial 4 : Using Newton's Method with Bigint + and div Only ]

Code

Code in C / C++ (for your interest)

Recursive variant

Iterative variant

Problem

About My Machine

Related Questions

1 Answers

[ Trial 2 : Using Bigint `+` Only ]

[ Trial 3 : Using Binary Search with Bigint `+1`, `-1` and `div 2` Only ]

[ Trial 4 : Using Newton's Method with Bigint `+` and `div` Only ]