2
votes

I think I've found a way you can do lossless compression using prime numbers or make other methods available over and over again.

There are 54 prime numbers in the range 0-255. When we have the prime number in a byte array, we can store it using the prime number indexes of that number, using 6 bits instead of 8 bits, right?

We need to create a map using a bit for each byte, which we can store for which numbers we are compressing, and add it to the data array.

This seems to work at first, but increases the file size slightly, but reduces the file to a re-compressible structure for algorithms like LZ4 or Huffman.

The indexes 0-53 correspond to the prime numbers 2-251, but there are unused numbers for 6 bits. (54-63). These 10 numbers can also be an opportunity to store 10 different non-prime numbers with the highest frequency in 6 bits. So we're going to squeeze more numbers. If this ratio exceeds 50%, we will be able to perform successful compression.

In addition, this method can create an opportunity to recompress compressed numbers. Do you think that if we create a method for uint and ulong data types in 4 or 8 byte data blocks instead of one byte, we can reduce the map of compressed data to a smaller size, right?

There are 203280221 prime numbers less than 2 ^ 32. This means that we can store 28 bits instead of 32 bits for each prime number.

By looking at the ratio of prime numbers in the data array, we can determine whether we can perform efficient compression.

I think that this method is not an effective compression method, but rather a data exchange that may allow re-compression of compressed files.

I would like to ask you know of other methods that we can do using prime numbers? Because obtaining prime numbers was very easy, especially for numbers below 2 ^ 32. We need to add prime number methods to our methods.

2
Compressing already compressed files is difficult. Compression methods work by reducing redundancy in the output file, hence a compressed file has very low redundancy compared to the uncompressed file. If you start with a low redundancy input then there is very little scope for further lossless compression.rossum
@ rossum: this is only true for schemes like general compressors that effectively hide the underlying structure of the data behind a pseudo-random fassade. Other schemes - like delta coding in the case of primes - can result in a significant space reduction while retaining or even increasing the potential for further compression by a general compressor.DarthGizka

2 Answers

0
votes

One efficient and practical compression method for 32-bit primes is storing the halved difference between consecutive primes using simple 8-bit bytes (and pulling the prime 2 out of thin air when needed). With a bit of indexing magic this gives superfast 'decompression' - i.e. it is an ideal form of representation when iteration over prime ranges is required since it uses less memory than plain primes.

The 203,280,220 odd primes below 2^32 need a corresponding number of bytes, i.e. a bit less than 194 MB. The usability of byte-sized halved differences extends to 304,599,508,537 (roughly 2^38) but from that point on it is necessary to invent a scheme for coding occasional gaps greater than 510. However, when storing gap indices instead of halved gap widths then the scheme can pretty much go on forever (see Prime gap article @ Wikipedia).

These blocks of consecutive differences can be further compressed using the usual zlib-stile compression or similar, in which case you get better compression than when compressing a packed odds-only bitmap or a mod 30 wheel. This could be used as a compact on-disk representation.

Note, however, that a mod 30 wheel takes only 136 MB and it is directly indexable - like for satisfying primality queries. Extracting (decoding) ranges of primes is quite a bit slower than for delta-coded primes but still faster than sieving the primes afresh. Hence storing primes using the mod 30 wheel is one of the most efficient ways for 'compressing' primes in a way that keeps them directly available/usable.

Mixed forms can also be quite successful. For example, when decoding primes from a mod 30 wheel you can store them directly in delta-coded form without materialising them as vectors of numbers. This could be viewed as a form of re-compression. The mod 30 wheel would be the on-disk and main memory format, the delta coding would be the interchange format for the results of certain queries.

Delta coding and wheeled storage share one important advantage: their storage requirements are basically independent of the size of the represented primes. By contrast, if you store the primes as numbers than you get logarithmically increasing storage requirements. I.e. a 64-bit prime takes four times as much space as a 16-bit prime.

An explanation of prime wheels can be found in the Wikipedia article Wheel factorization.

The following table shows the space reduction that you get when adding more and more primes to the wheel, both relative to an unwheeled representation ('ratio') and relative to the preceding step ('delta'). The column 'spokes' gives the number of of potentially prime-bearing spokes (residues) per wheel rotation, which gives you an idea of the complexity of an optimised, fully unrolled implementation à la Kim Walisch's primesieve. As you can see the gains are diminishing rapidly while complexity is exploding.

wheel efficiency table

The wheel mod 2 - a.k.a. odds-only sieve - is important because it gives you most of the bang at virtually no cost in code complexity. Most of speed benefits of the mod 3 wheel can be reaped with almost no effort by sleight-of-hand (indexing tricks) when operating an odds-only sieve. The wheel mod 30 is important because it offers a good balance between code complexity and speedup, which makes it an ideal target for optimised implementations. The wheels up to mod 30030 have been used in practical implementations but the gain compared to the mod 30 wheel is actually negligible while the increased complexity and lost performance are not. Even higher-order wheels have been seen on university campuses, in the context of special projects.

One of the reasons why Kim Walisch's sieve program is the fastest that you can find anywhere on the Net is that he implemented unrolled versions of each of the 8 possible striding sequences of the mod 30 wheel, with specific loop entry points for each of the 8 possible phases of each sequence. All that in the only language (C/C++) for which you can get optimising compilers that are actually worthy of the name.

Doing the same for the 48 residues of the wheel mod 210 would mean 36 times as much code (!) while the reduction in bitmap size is negligible. However, mod 210 and mod 2310 might be promising targets for a code generator (C/C++ code or .NET IL), if you have a couple months of spare time to burn.

0
votes

You guys got me wrong. I'm talking about compressing random data. If you divide a file into 4-byte data blocks, 1 out of 10 of each 4-byte number will be prime. You should also use a bitmap to determine whether each number is prime. Using a bitmap will clearly increase the size of the file, but I think it can be compressed more than 32/80. Because it has too many 0 bits.

You might say, "This is a very funny compression algorithm." Because it is more compressed size is using other compression algorithms. But isn't it important to have the ability to repeat the compression process? Please ignore this funny situation.

Step by step. First of all, let me share the Prime Helper codes we need.

How can the findPrimes () and saveToFile () methods be more efficient in these codes? For now, the prime numbers up to 2 ^ 32 are found on my Core i7 in about 23 minutes and are saved in 512MB. You can then load the prime numbers from the file for direct use without recalculating.

using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Linq;

namespace PrimeNumbers {
    public static class PrimeHelper {
        public static IEnumerable<long> FindPrimes (long limit) {
            return (new Primes (limit));
        }

        public static IEnumerable<long> FindPrimes (long start, long end) {
            return FindPrimes (end).Where (pn => pn >= start);
        }

        public static bool IsPrime (this long number) {
            if (number < 2)
                return false;
            else if (number < 4)
                return true;

            var limit = (Int32) System.Math.Sqrt (number) + 1;
            var foundPrimes = new Primes (limit);

            return foundPrimes.IsPrime (number);
        }

        public static bool IsPrime (this int number) {
            return IsPrime (Convert.ToInt64 (number));
        }

        public static bool IsPrime (this short number) {
            return IsPrime (Convert.ToInt64 (number));
        }

        public static bool IsPrime (this byte number) {
            return IsPrime (Convert.ToInt64 (number));
        }

        public static bool IsPrime (this uint number) {
            return IsPrime (Convert.ToInt64 (number));
        }

        public static bool IsPrime (this ushort number) {
            return IsPrime (Convert.ToInt64 (number));
        }

        public static bool IsPrime (this sbyte number) {
            return IsPrime (Convert.ToInt64 (number));
        }
    }

    public class Primes : IEnumerable<long>, IDisposable {
        private long limit;
        private MyBitArray numbers;

        public Primes (long limit) {
            startTime = DateTime.Now;
            calculateTime = startTime - startTime;
            this.limit = limit + 1;
            try { findPrimes (); } catch (Exception e) { Console.WriteLine (e.Message); /*Overflows or Out of Memory*/ }

            calculateTime = DateTime.Now - startTime;
        }

        public Primes (MyBitArray bitArray) {
            this.numbers = bitArray;
            this.limit = bitArray.Limit;
        }

        private void findPrimes () {
            /*
            The Sieve Algorithm
            http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
            */
            numbers = new MyBitArray (limit, true);
            for (long i = 2; i < limit; i++)
                if (numbers[i])
                    for (long j = i * 2; j < limit; j += i)
                        numbers[j] = false;
        }

        public IEnumerator<long> GetEnumerator () {
            for (long i = 2; i < limit; i++)
                if (numbers[i])
                    yield return i;
        }

        IEnumerator IEnumerable.GetEnumerator () {
            return GetEnumerator ();
        }

        // Extended big long number for quickly calculation
        public bool IsDivisible (long number) {
            var sqrt = System.Math.Sqrt (number);
            foreach (var prime in this) {
                if (number % prime == 0) {
                    DivisibleBy = prime;
                    return true;
                }

                if (prime > sqrt)
                    return false;
            }

            return false;
        }

        // For big long number
        public bool IsPrime (long number) {
            return number < limit ?
                this.SingleOrDefault (n => n == number) != 0 :
                !IsDivisible (number);
        }

        public void Dispose () {
            numbers.Dispose ();
        }

        public void SaveToFile (string fileName, SaveOptions saveOptions) {
            int elementSize = 8;

            if ((this.limit - 1L) <= Convert.ToInt64 (byte.MaxValue))
                elementSize = 1;
            else if ((this.limit - 1L) <= Convert.ToInt64 (ushort.MaxValue))
                elementSize = 2;
            else if ((this.limit - 1L) <= Convert.ToInt64 (uint.MaxValue))
                elementSize = 4;

            switch (saveOptions) {
                case SaveOptions.TextFile:
                    saveToTextFile (fileName);
                    break;
                case SaveOptions.BinaryAllNumbersBitArray:
                    saveToFile (fileName, this.Numbers.Bytes);
                    break;
                case SaveOptions.BinaryPrimeList:
                    saveToFile (fileName, this, elementSize);
                    break;
                case SaveOptions.BinaryAllNumbersAndPrimeIndex:
                    saveToFileOnlyIndex (fileName, this, elementSize);
                    break;
                case SaveOptions.All:
                    saveToFile (Path.Combine (Path.GetDirectoryName (fileName), Path.GetFileNameWithoutExtension (fileName) + "_BinaryAllNumbersBitArray.bin"), this.Numbers.Bytes);
                    saveToFile (Path.Combine (Path.GetDirectoryName (fileName), Path.GetFileNameWithoutExtension (fileName) + "_BinaryPrimeList.bin"), this, elementSize);
                    saveToFileOnlyIndex (Path.Combine (Path.GetDirectoryName (fileName), Path.GetFileNameWithoutExtension (fileName) + "_BinaryAllNumbersPrimeIndex.bin"), this, elementSize);
                    saveToTextFile (Path.Combine (Path.GetDirectoryName (fileName), Path.GetFileNameWithoutExtension (fileName) + "_TextFile.txt"));
                    break;

            }
        }

        protected void saveToFile (string fileName, byte[] data) {
            Console.WriteLine ("Saving numbers bits if is prime bit 1 else bit 0...");
            using (MemoryStream mstream = new MemoryStream (data)) {
                ProgressCallback callback = (long position, long total) => { };
                copy (mstream, fileName, callback);
            }
        }

        protected void saveToFile (string fileName, Primes primes, int elementSize = 4) {
            Console.WriteLine ("Saving primes. Element size: {0}...", elementSize);
            using (FileStream file = File.OpenWrite (fileName)) {
                long writed = 0;
                foreach (var prime in primes) {
                    byte[] data = new byte[elementSize];

                    if (elementSize == 8) {
                        data = BitConverter.GetBytes (Convert.ToUInt64 (prime));
                    } else if (elementSize == 4) {
                        data = BitConverter.GetBytes (Convert.ToUInt32 (prime));
                    } else if (elementSize == 2) {
                        data = BitConverter.GetBytes (Convert.ToUInt16 (prime));
                    } else if (elementSize == 1) {
                        data = BitConverter.GetBytes (Convert.ToByte (prime));
                    }

                    file.Write (data, 0, elementSize);
                    writed++;

                    if (writed == 536870912L) {
                        file.Flush (true);
                        writed = 0;
                    }
                }
                file.Close ();
            }
        }

        protected void saveToFileOnlyIndex (string fileName, Primes primes, int elementSize = 4) {
            Console.WriteLine ("Saving all numbers prime indexes if not prime index is -1. Element size: {0}...", elementSize);
            using (FileStream file = File.OpenWrite (fileName)) {
                int index = 0;
                long writed = 0;
                long length = Convert.ToInt64 (uint.MaxValue) + 1L;
                for (long i = 0; i < length; i++) {
                    byte[] data = new byte[elementSize];

                    if (elementSize == 8) {
                        if (primes.Numbers[i]) {
                            index++;
                            data = BitConverter.GetBytes (Convert.ToInt64 (index));
                        } else {
                            data = BitConverter.GetBytes (Convert.ToInt32 (-1));
                        }
                    } else if (elementSize == 4) {
                        if (primes.Numbers[i]) {
                            index++;
                            data = BitConverter.GetBytes (Convert.ToInt32 (index));
                        } else {
                            data = BitConverter.GetBytes (Convert.ToInt32 (-1));
                        }
                    } else if (elementSize == 2) {
                        if (primes.Numbers[i]) {
                            index++;
                            data = BitConverter.GetBytes (Convert.ToInt16 (index));
                        } else {
                            data = BitConverter.GetBytes (Convert.ToInt16 (-1));
                        }
                    } else if (elementSize == 1) {
                        if (primes.Numbers[i]) {
                            index++;
                            data = BitConverter.GetBytes (Convert.ToSByte (index));
                        } else {
                            data = BitConverter.GetBytes (Convert.ToSByte (-1));
                        }
                    }

                    file.Write (data, 0, elementSize);
                    writed++;

                    if (writed == 536870912L) {
                        file.Flush (true);
                        writed = 0;
                    }
                }
                file.Close ();
            }
        }

        public delegate void ProgressCallback (long position, long total);
        protected void copy (Stream inputStream, string outputFile, ProgressCallback progressCallback) {
            using (var outputStream = File.OpenWrite (outputFile)) {
                int writed = 0;
                const int bufferSize = 65536;
                while (inputStream.Position < inputStream.Length) {
                    byte[] data = new byte[bufferSize];
                    int amountRead = inputStream.Read (data, 0, bufferSize);
                    outputStream.Write (data, 0, amountRead);

                    writed++;

                    if (progressCallback != null)
                        progressCallback (inputStream.Position, inputStream.Length);

                    if (writed == 32768) {
                        outputStream.Flush (true);
                        writed = 0;
                    }
                }
                outputStream.Flush (true);
            }
        }

        protected void saveToTextFile (string fileName) {
            Console.WriteLine ("Saving primes to text file...");
            using (System.IO.StreamWriter file =
                new System.IO.StreamWriter (fileName)) {
                long writed = 0;
                foreach (var prime in this) {
                    file.WriteLine (prime.ToString ());

                    if (writed == 536870912L) {
                        file.Flush ();
                        writed = 0;
                    }
                }

                file.Close ();
            }
        }

        private static DateTime startTime;
        private static TimeSpan calculateTime;
        public TimeSpan CalculateTime { get { return calculateTime; } }
        public long DivisibleBy { get; set; }
        public long Length { get { return limit; } }
        public MyBitArray Numbers { get { return numbers; } }
    }

    public class MyBitArray : IDisposable {
        byte[] bytes;
        public MyBitArray (long limit, bool defaultValue = false) {
            long byteCount = Convert.ToInt64 ((limit >> 3) + 1);
            this.bytes = new byte[byteCount];
            for (long i = 0; i < byteCount; i++) {
                bytes[i] = (defaultValue == true ? (byte) 0xFF : (byte) 0x00);
            }
            this.limit = limit;
        }

        public MyBitArray (long limit, byte[] bytes) {
            this.limit = limit;
            this.bytes = bytes;
        }

        public bool this [long index] {
            get {
                return getValue (index);
            }
            set {
                setValue (index, value);
            }
        }

        bool getValue (long index) {
            if (index < 8) {
                return getBit (bytes[0], (byte) index);
            }

            long byteIndex = (index & 7) == 0 ? ((index >> 3) - 1) : index >> 3;
            byte bitIndex = (byte) (index & 7);
            return getBit (bytes[byteIndex], bitIndex);
        }
        void setValue (long index, bool value) {
            if (index < 8) {
                bytes[0] = setBit (bytes[0], (byte) index, value);
                return;
            }

            long byteIndex = (index & 7) == 0 ? (index >> 3) - 1 : index >> 3;
            byte bitIndex = (byte) (index & 7);

            bool old = getBit (bytes[byteIndex], bitIndex);

            bytes[byteIndex] = setBit (bytes[byteIndex], bitIndex, value);
        }

        bool getBit (byte byt, byte index) {
            return ((byt & (1 << index)) >> index) == 1;
        }

        byte setBit (byte byt, byte index, bool value) {
            return (byte) ((byt & ~(1 << index)) + (value ? 1 << index : 0));
        }

        public void Dispose () {
            GC.Collect (2, GCCollectionMode.Optimized);
        }

        private long limit;
        public long Limit { get { return limit; } }
        public byte[] Bytes { get { return this.bytes; } }
    }

    public enum SaveOptions {
        All,
        TextFile,
        BinaryAllNumbersBitArray,
        BinaryPrimeList,
        BinaryAllNumbersAndPrimeIndex
    }
}