3
votes

I want to automatically generate Java's serialVersionUID (which is a long, or 64 bits). What distinguishes the object to be serialized is determined by about 20 integers, but not always 20 integers. I intend to convert the integers into a comma separated string of numbers and run it through the SHA-256 hash function.

Since SHA-256 is 32 bytes long (256 bits), and I need it to fit into the serialVersionUID (64 bits), how might I convert it to a 64 bit value and minimize the loss of the characteristics of a good hash?

5
Ae you aware of the 'serialver' tool? And are you aware that serialVersionUIDs don't have to be distinct per class?user207421
I am aware versions of a class do not need to be distinct. I don't need every version of a class to be distinct. I use serialVersionUID because I want to control compatibility, but I can also automate it. That way, I retain control and eliminate the risk of making a human error.H2ONaCl

5 Answers

5
votes

Just cut off the extra bits. There is no need to complicate things. If there is a superior method to just taking the first (or any other) 64 bits, then the hash is broken in the first place.

3
votes

First of all, it is unlikely that you can compress a good hash in the normal sense. Compression is about a reversible encoding that reduces redundancy. In a good hash there should be no redundancy to reduce, and hence compression will be ineffective.

Since SHA-256 is 32 bytes long (256 bits), and I need it to fit into the serialVersionUID (64 bits), how might I convert it to a 64 bit value and minimize the loss of the characteristics of a good hash?

So what are those good characteristics? Well the primary characteristic of a good hash is that it is impractical to reverse it; i.e. it is impractical to work out a possible input that resulted in the hash. And a related characteristic is that given a known input that produces a given hash, it is impractical to produce another input (i.e. a collision) that gives the same hash.

Now when you go from a 256 bit to a 64 bit hash, you make it a whole lot easier to reverse a hash or produce a collision for a hash ... by brute-force. Basically, an 64 bit hash means there is one chance in 2^64 that any random input will have a given hash. That probability is large enough that some "bad guy" with enough cores has a good enough chance of success (in a reasonable time) to make brute-force a reasonable option.

But does it really matter? What would someone achieve by creating a serialVersion String that collides? These strings are not secret, and they don't tell you anything definitive about the API of the object ...

The bottom line is that if these reduced hashes are being used as serialVersion strings are designed to be used, then there won't be any problem in (for example) just using the first 64 bits of the SHA-256 hash. There is no need to XOR or checksum or do any other more complicated transformation.

1
votes

You could calculate the cyclic redundancy check (CRC) of the SHA-256 digest.

0
votes

I would say either use a 64 bit checksum, or if you want to stick with SHA, then XOR the 64 bit chunks.

0
votes

hash it with ripemd-160.

eg,

4727c1278432c388eea822904f008468c02fd543fc347391d1f2b9918ec9b5b9

becomes

069e298ee9d1b14e7774434624703c0be1a47ee1

That is 66 characters, reduced to 40.