1
votes

I'm trying to gather some statistics on prime numbers, among which is the distribution of factors for the number (prime-1)/2. I know there are general formulas for the size of factors of uniformly selected numbers, but I haven't seen anything about the distribution of factors of one less than a prime.

I've written a program to iterate through primes starting at the first prime after 2^63, and then factor the (prime - 1)/2 using trial division by all primes up to 2^32. However, this is extremely slow because that is a lot of primes (and a lot of memory) to iterate through. I store the primes as a single byte each (by storing the increment from one prime to the next). I also use a deterministic variant of the Miller-Rabin primality test for numbers up to 2^64, so I can easily detect when the remaining value (after a successful division) is prime.

I've experimented using a variant of pollard-rho and elliptic curve factorization, but it is hard to find the right balance of between trial division and switching to these more complicated methods. Also I'm not sure I've implemented them correctly, because sometimes they seem to take a very lone time to find a factor, and based on their asymptotic behavior, I'd expect them to be quite quick for such small numbers.

I have not found any information on factoring many numbers (vs just trying to factor one), but it seems like there should be some way to speed up the task by taking advantage of this.

Any suggestions, pointers to alternate approaches, or other guidance on this problem is greatly appreciated.


Edit: The way I store the primes is by storing an 8-bit offset to the next prime, with the implicit first prime being 3. Thus, in my algorithms, I have a separate check for division by 2, then I start a loop:

factorCounts = collections.Counter()
while N % 2 == 0:
    factorCounts[2] += 1
    N //= 2
pp = 3
for gg in smallPrimeGaps:
    if pp*pp > N:
        break
    if N % pp == 0:
        while N % pp == 0:
            factorCounts[pp] += 1
            N //= pp
    pp += gg

Also, I used a wheel sieve to calculate the primes for trial division, and I use an algorithm based on the remainder by several primes to get the next prime after the given starting point.


I use the following for testing if a given number is prime (porting code to c++ now):

bool IsPrime(uint64_t n)
{
    if(n < 341531)
        return MillerRabinMulti(n, {9345883071009581737ull});
    else if(n < 1050535501)
        return MillerRabinMulti(n, {336781006125ull, 9639812373923155ull});
    else if(n < 350269456337)
        return MillerRabinMulti(n, {4230279247111683200ull, 14694767155120705706ull, 1664113952636775035ull});
    else if(n < 55245642489451)
        return MillerRabinMulti(n, {2ull, 141889084524735ull, 1199124725622454117, 11096072698276303650});
    else if(n < 7999252175582851)
        return MillerRabinMulti(n, {2ull, 4130806001517ull, 149795463772692060ull, 186635894390467037ull, 3967304179347715805ull});
    else if(n < 585226005592931977)
        return MillerRabinMulti(n, {2ull, 123635709730000ull, 9233062284813009ull, 43835965440333360ull, 761179012939631437ull, 1263739024124850375ull});
    else
        return MillerRabinMulti(n, {2ull, 325ull, 9375ull, 28178ull, 450775ull, 9780504ull, 1795265022ull});
}
2
Some code snippets would help....Alex
There's a Python program pyecm which uses elliptic curve to find big factors. Unfortunately, it's designed for use from the command line, not for importing into your own scripts (at, least the version I have is like that). As for factorising a range of numbers, you could use a modified segmented sieve. It may not help much, since it still has to divide by primes<2^32 to check 64 bit numbers, but I guess it's worth checking out.PM 2Ring
is implementign divisibility rules worthwhile?Ma0
Oops. I just noticed that I made I minor blunder in my previous comment. The segmented sieve algorithm has to loop over primes<2^32, it doesn't do division to find primes (apart from a single one at the start of each inner loop), but I guess it does have to do division if you want to factorise. FWIW, all primes >30 can be written as 30n+r for r in {1, 7, 11, 13, 17, 19, 23, 29}, so numbers of the form (p-1)/2 can be written as 15n+r for r in {0, 3, 5, 6, 8, 9, 11, 14}. Obviously when r is in {3,5,6,9} the number isn't prime.PM 2Ring

2 Answers

2
votes

I don't have a definitive answer, but I do have some observations and some suggestions.

There are about 2*10^17 primes between 2^63 and 2^64, so any program you write is going to run for a while.

Let's talk about a primality test for numbers in the range 2^63 to 2^64. Any general-purpose test will do more work than you need, so you can speed things up by writing a special-purpose test. I suggest strong-pseudoprime tests (as in Miller-Rabin) to bases 2 and 3. If either of those tests shows the number is composite, you're done. Otherwise, look up the number (binary search) in a table of strong-pseudoprimes to bases 2 and 3 (ask Google to find those tables for you). Two strong pseudoprime tests followed by a table lookup will certainly be faster than the deterministic Miller-Rabin test you are currently performing, which probably uses six or seven bases.

For factoring, trial division to 1000 followed by Brent-Rho until the product of the known prime factors exceeds the cube root of the number being factored ought to be fairly fast, a few milliseconds. Then, if the remaining cofactor is composite, it will necessarily have only two factors, so SQUFOF would be a good algorithm to split them, faster than the other methods because all the arithmetic is done with numbers less than the square root of the number being factored, which in your case means the factorization could be done using 32-bit arithmetic instead of 64-bit arithmetic, so it ought to be fast.

Instead of factoring and primality tests, a better method uses a variant of the Sieve of Eratosthenes to factor large blocks of numbers. That will still be slow, as there are 203 million sieving primes less than 2^32, and you will need to deal with the bookkeeping of a segmented sieve, but considering that you factor lots of numbers at once, it's probably the best approach to your task.

I have code for everything mentioned above at my blog.

0
votes

This is how I store primes for later: (I'm going to assume you want the factors of the number, and not just a primality test).

Copied from my website http://chemicaldevelopment.us/programming/2016/10/03/PGS.html

I’m going to assume you know the binary number system for this part. If not, just think of 1 as a “yes” and 0 as a “no”.

So, there are plenty of algorithms to generate the first few primes. I use the Sieve of Eratosthenes to compute a list.

But, if we stored the primes as an array, like [2, 3, 5, 7] this would take up too much space. How much space exactly?

Well, 32 bit integers which can store up to 2^32 each take up 4 bytes because each byte is 8 bits, and 32 / 8 = 4

If we wanted to store each prime under 2,000,000,000, we would have to store over 98,000,000,000. This takes up more space, and is slower at runtime than a bitset, which is explained below.

This approach will take 98,000,000 integers of space (each is 32 bits, which is 4 bytes), and when we check at runtime, we will need to check every integer in the array until we find it, or we find a number that is greater than it.

For example, say I give you a small list of primes: [2, 3, 5, 7, 11, 13, 17, 19]. I ask you if 15 is prime. How do you tell me?

Well, you would go through the list and compare each to 15.

Is 2 = 15?

Is 3 = 15?

. . .

Is 17 = 15?

At this point, you can stop because you have passed where 15 would be, so you know it isn’t prime.

Now then, let’s say we use a list of bits to tell you if the number is prime. The list above would look like:

001101010001010001010

This starts at 0, and goes to 19

The 1s mean that the index is prime, so count from the left: 0, 1, 2

001101010001010001010

The last number in bold is 1, which indicates that 2 is prime.

In this case, if I asked you to check if 15 is prime, you don’t need to go through all the numbers in the list; All you need to do is skip to 0 . . . 15, and check that single bit.

And for memory usage, the first approach uses 98000000 integers, whereas this one can store 32 numbers in a single integer (using the list of 1s and 0s), so we would need 2000000000/32=62500000 integers.

So it uses about 60% as much memory as the first approach, and is much faster to use.

We store the array of integers from the second approach in a file, then read it back when you run.

This uses 250MB of ram to store data on the first 2000000000 primes.

You can further reduce this with wheel sieving (like what you did storing (prime-1)/2)

I'll go a little bit more into wheel sieve.

You got it right by storing (prime - 1)/2, and 2 being a special case.

You can extend this to p# (the product of the first p primes)

For example, you use (1#)*k+1 for numbers k

You can also use the set of linear equations (n#)*k+L, where L is the set of primes less than n# and 1 excluding the first n primes.

So, you can also just store info for 6*k+1 and 6*k+5, and even more than that, because L={1, 2, 3, 5}{2, 3}

These methods should give you an understanding of some the methods behind it.

You will need someway to implement this bitset, such as a list of 32 bit integers, or a string.

Look at: https://pypi.python.org/pypi/bitarray for a possible abstraction