9
votes

Well, I have this bit of code that is slowing down the program hugely because it is linear complexity but called a lot of times making the program quadratic complexity. If possible I would like to reduce its computational complexity but otherwise I'll just optimize it where I can. So far I have reduced down to:

def table(n):
    a = 1
    while 2*a <= n:
      if (-a*a)%n == 1: return a

      a += 1

Anyone see anything I've missed? Thanks!

EDIT: I forgot to mention: n is always a prime number.

EDIT 2: Here is my new improved program (thank's for all the contributions!):

def table(n):
    if n == 2: return 1
    if n%4 != 1: return

    a1 = n-1
    for a in range(1, n//2+1):
        if (a*a)%n == a1: return a

EDIT 3: And testing it out in its real context it is much faster! Well this question appears solved but there are many useful answers. I should also say that as well as those above optimizations, I have memoized the function using Python dictionaries...

10
You are calling this numerous times: for a lot of different values of n? Is their a pattern for the values of n?David Nehme
Is it likely that some values of n will be called multiple times? (in which case there might be some value in storing the calculated values to save re-running the calculation)hamishmcn
What kind of value is n at max, and how big is it typically?Lasse V. Karlsen
I've added an improved (faster) version of your code... Check it out in my answer.strager
This function is called (and memoized) for lots of different prime n. Some n are large, some small... strager, thanks!PythonPower

10 Answers

6
votes

Ignoring the algorithm for a moment (yes, I know, bad idea), the running time of this can be decreased hugely just by switching from while to for.

for a in range(1, n / 2 + 1)

(Hope this doesn't have an off-by-one error. I'm prone to make these.)

Another thing that I would try is to look if the step width can be incremented.

5
votes

Take a look at http://modular.fas.harvard.edu/ent/ent_py . The function sqrtmod does the job if you set a = -1 and p = n.

You missed a small point because the running time of your improved algorithm is still in the order of the square root of n. As long you have only small primes n (let's say less than 2^64), that's ok, and you should probably prefer your implementation to a more complex one.

If the prime n becomes bigger, you might have to switch to an algorithm using a little bit of number theory. To my knowledge, your problem can be solved only with a probabilistic algorithm in time log(n)^3. If I remember correctly, assuming the Riemann hypothesis holds (which most people do), one can show that the running time of the following algorithm (in ruby - sorry, I don't know python) is log(log(n))*log(n)^3:

class Integer
  # calculate b to the power of e modulo self
  def power(b, e)
    raise 'power only defined for integer base' unless b.is_a? Integer
    raise 'power only defined for integer exponent' unless e.is_a? Integer
    raise 'power is implemented only for positive exponent' if e < 0
    return 1 if e.zero?
    x = power(b, e>>1)
    x *= x
    (e & 1).zero? ? x % self : (x*b) % self
  end
  # Fermat test (probabilistic prime number test)
  def prime?(b = 2)
    raise "base must be at least 2 in prime?" if b < 2
    raise "base must be an integer in prime?" unless b.is_a? Integer
    power(b, self >> 1) == 1
  end
  # find square root of -1 modulo prime
  def sqrt_of_minus_one
    return 1 if self == 2
    return false if (self & 3) != 1
    raise 'sqrt_of_minus_one works only for primes' unless prime?
    # now just try all numbers (each succeeds with probability 1/2)
    2.upto(self) do |b|
      e = self >> 1
      e >>= 1 while (e & 1).zero?
      x = power(b, e)
      next if [1, self-1].include? x
      loop do
        y = (x*x) % self
        return x if y == self-1
        raise 'sqrt_of_minus_one works only for primes' if y == 1
        x = y
      end
    end
  end
end

# find a prime
p = loop do
      x = rand(1<<512)
      next if (x & 3) != 1
      break x if x.prime?
    end

puts "%x" % p
puts "%x" % p.sqrt_of_minus_one

The slow part is now finding the prime (which takes approx. log(n)^4 integer operation); finding the square root of -1 takes for 512-bit primes still less than a second.

4
votes

Consider pre-computing the results and storing them in a file. Nowadays many platforms have a huge disk capacity. Then, obtaining the result will be an O(1) operation.

3
votes

(Building on Adam's answer.) Look at the Wikipedia page on quadratic reciprocity:

x^2 ≡ −1 (mod p) is solvable if and only if p ≡ 1 (mod 4).

Then you can avoid the search of a root precisely for those odd prime n's that are not congruent with 1 modulo 4:

def table(n):
    if n == 2: return 1
    if n%4 != 1: return None   # or raise exception
    ...
2
votes

It looks like you're trying to find the square root of -1 modulo n. Unfortunately, this is not an easy problem, depending on what values of n are input into your function. Depending on n, there might not even be a solution. See Wikipedia for more information on this problem.

2
votes

Edit 2: Surprisingly, strength-reducing the squaring reduces the time a lot, at least on my Python2.5 installation. (I'm surprised because I thought interpreter overhead was taking most of the time, and this doesn't reduce the count of operations in the inner loop.) Reduces the time from 0.572s to 0.146s for table(1234577).

 def table(n):
     n1 = n - 1
     square = 0
     for delta in xrange(1, n, 2):
         square += delta
         if n <= square: square -= n
         if square == n1: return delta // 2 + 1

strager posted the same idea but I think less tightly coded. Again, jug's answer is best.

Original answer: Another trivial coding tweak on top of Konrad Rudolph's:

def table(n):
    n1 = n - 1
    for a in xrange(1, n // 2 + 1):
          if (a*a) % n == n1: return a

Speeds it up measurably on my laptop. (About 25% for table(1234577).)

Edit: I didn't notice the python3.0 tag; but the main change was hoisting part of the calculation out of the loop, not the use of xrange. (Academic since there's a better algorithm.)

2
votes

Based off OP's second edit:

def table(n):
    if n == 2: return 1
    if n%4 != 1: return

    mod = 0
    a1 = n - 1
    for a in xrange(1, a1, 2):
        mod += a

        while mod >= n: mod -= n
        if mod == a1: return a//2 + 1
1
votes

Is it possible for you to cache the results?

When you calculate a large n you are given the results for the lower n's almost for free.

1
votes

One thing that you are doing is repeating the calculation -a*a over and over again.

Create a table of the values once and then do look up in the main loop.

Also although this probably doesn't apply to you because your function name is table but if you call a function that takes time to calculate you should cache the result in a table and just do a table look up if you call it again with the same value. This save you the time of calculating all of the values when you first run but you don't waste time repeating the calculation more than once.

1
votes

I went through and fixed the Harvard version to make it work with python 3. http://modular.fas.harvard.edu/ent/ent_py

I made some slight changes to make the results exactly the same as the OP's function. There are two possible answers and I forced it to return the smaller answer.

import timeit

def table(n):

    if n == 2: return 1
    if n%4 != 1: return

    a1=n-1

    def inversemod(a, p):
        x, y = xgcd(a, p)
        return x%p

    def xgcd(a, b):
        x_sign = 1
        if a < 0: a = -a; x_sign = -1
        x = 1; y = 0; r = 0; s = 1
        while b != 0:
            (c, q) = (a%b, a//b)
            (a, b, r, s, x, y) = (b, c, x-q*r, y-q*s, r, s)
        return (x*x_sign, y)

    def mul(x, y):      
        return ((x[0]*y[0]+a1*y[1]*x[1])%n,(x[0]*y[1]+x[1]*y[0])%n)

    def pow(x, nn):      
        ans = (1,0)
        xpow = x
        while nn != 0:
           if nn%2 != 0:
               ans = mul(ans, xpow)
           xpow = mul(xpow, xpow)
           nn >>= 1
        return ans

    for z in range(2,n) :
        u, v = pow((1,z), a1//2)
        if v != 0:
            vinv = inversemod(v, n)
            if (vinv*vinv)%n == a1:
                vinv %= n
                if vinv <= n//2:
                    return vinv
                else:
                    return n-vinv


tt=0
pri = [ 5,13,17,29,37,41,53,61,73,89,97,1234577,5915587277,3267000013,3628273133,2860486313,5463458053,3367900313 ]
for x in pri:
    t=timeit.Timer('q=table('+str(x)+')','from __main__ import table')
    tt +=t.timeit(number=100)
    print("table(",x,")=",table(x))

print('total time=',tt/100)

This version takes about 3ms to run through the test cases above.

For comparison using the prime number 1234577
OP Edit2 745ms
The accepted answer 522ms
The above function 0.2ms