4
votes

In my application I'm going to use floating point values to store geographical coordinates (latitude and longitude).

I know that the integer part of these values will be in range [-90, 90] and [-180, 180] respectively. Also I have requirement to enforce some fixed precision on these values (for now it is 0.00001 but can be changed later).

After studying single precision floating point type (float) I can see that it is just a little bit small to contain my values. That's because 180 * 10^5 is greater than 2^24 (size of the significand of float) but less than 2^25.

So I have to use double. But the problem is that I'm going to store huge amounts of this values, so I don't want to waste bytes, storing unnecessary precision.

So how can I perform some sort of compression when converting my double value (with fixed integer part range and specified precision X) to byte array in java? So for example if I use precision from my example (0.00001) I end up with 5 bytes for each value. I'm looking for a lightweight algorithm or solution so that it doesn't imply huge calculations.

3
After you compact your data you can still use compression, e.g. GZIPOutputStream which may make it smaller again. - Peter Lawrey
@PeterLawrey for array with decent size should be very useful. I think your suggestion have something common with suggestion of Aaron - they both exploit the fact that stored values will be close to each other. - pavel_kazlou

3 Answers

6
votes

To store a number x to a fixed precision of (for instance) 0.00001, just store the integer closest to 100000 * x. (By the way, this requires 26 bits, not 25, because you need to store negative numbers too.)

3
votes

As TonyK said in his answer, use an int to store the numbers.

To compress the numbers further, use locality: Geo coordinates are often "clumped" (say the outline of a city block). Use a fixed reference point (full 2x26 bits resolution) and then store offsets to the last coordinate as bytes (gives you +/-0.00127). Alternatively, use short which gives you more than half the value range.

Just be sure to hide the compression/decompression in a class which only offers double as outside API, so you can adjust the precision and the compression algorithm at any time.

1
votes

Considering your use case, i would nonetheless use double and compress them directly.

The reason is that strong compressors, such as 7zip, are extremely good at handling "structured" data, which an array of double is (one data = 8 bytes, this is very regular & predictable).

Any other optimisation you may come up "by hand" is likely to be inferior or offer negligible advantage, while simultaneously costing you time and risks.

Note that you can still apply the "trick" of converting the double into int before compression, but i'm really unsure if it would bring you tangible benefit, while on the other hand it would seriously reduce your ability to cope with unforeseen ranges of figures in the future.

[Edit] Depending on source data, if "lower than precision level" bits are "noisy", it can be usefull for compression ratio to remove the noisy bits, either by rounding the value or even directly applying a mask on lowest bits (i guess this last method will not please purists, but at least you can directly select your precision level this way, while keeping available the full range of possible values).

So, to summarize, i'd suggest direct LZMA compression on your array of double.