12
votes

I was reading the paragraph quoted below from an article entitled- Java theory and practice: Hashing it out - Defining hashCode() and equals() effectively and correctly

Defining equality The Object class has two methods for making inferences about an object's identity: equals() and hashCode(). In general, if you override one of these methods, you must override both, as there are important relationships between them that must be maintained. In particular, if two objects are equal according to the equals() method, they must have the same hashCode() value (although the reverse is not generally true).[emphasis added by me]

My question relates to the latter bit of the paragraph "although the reverse is not generally true". How is it possible for two different instances of a class to have the same hashCode but not be equal?

8
Simple. A hash is of a fixed, relatively short length. You could have two different thousand byte strings that produce the same hash, simply because there's no way to represent all the possible thousand byte strings with unique hash values.Hot Licks
Long.valueOf(0).hashCode() == Long.valueOf(-1).hashCode()nansen

8 Answers

17
votes

In simple terms hashcode () is a function to generate hash by some formula, so there can be some collisions, two different values can turn out to have same hashcode.

If I simply calculate the hashcode by taking mod by 6, then two different values might be having same hashcode.

4
votes

You can consider hashes to be a bucket..

  • If two objects are equal, they will go into the same bucket (have same hashcodes)
  • But, if the two objects go into the same bucket (have same hashcode), that doesn't mean that they must be equal
  • Also note that, if two objects are not equal, even then they can have the same hash code.. Obviously, this infers from the above two points..

So, hashcode is nothing but the hash-value for that Bucket.. Any number of objects can have same hashcode, depending upon the algorithm used to calculate the hashcodes..

An ideal algorithm is the one, which generates different hashcodes for different objects. So, there is ideally 1 object per bucket.. Of course this is the perfect case, which might not be possible..

A bucket may of course contain several objects, based on some property..

4
votes

Think of hashcode as something that just reduces the effort in checking equality. If two objects are equal they will definitely have the same hashcode. However if two objects have the same hashcode, they might have a mathematically high similarity but still not be the same. Just for mindset: Think of comparing a duck to an elephant in a zoo. They are highly dissimilar and will have different abstract hashcode, so you dont have to bother comparing their legs, wings etc to check if they are same. However if you are comparing a duck and a swan, they are highly similar and have same abstract hashcode, so now you are down to comparing very minute features of each animal to check for equality. As you reduce the extremeness between two elements being compared, the abstract hashcode becomes more and more concrete. Like comparing ducks and swans has more concrete hashcode than comparing ducks and elephants, comparing different breed of ducks makes the hash code even more concrete, comparing dna of two ducks of same breed makes the hashcode even more concrete. This answer is just designed to create a mindset to understand concept of hashcode. After reading this, you must blur out the understanding of the word hashcode in context of this answer.

3
votes

I think the reverse is actually

if two objects are NOT equal according to the equals() method, they must have the A DIFFERENT hashCode() value

which clearly does not hold since generating unique hashes in the general case is not possible because you're usually trying to map a set of values onto a set of hash codes of lower cardinality.

2
votes

I will explain it using example. Let's say that hashCode() of string is based on the string length. In this case the hash code of "foo" and "bar" are equal. But "foo" itself is not equal to "bar".

It is because has code implements a kind of formula: you can determine has code for each object but cannot restore object from hash code. There can be several objects with same hash code.

1
votes

You can define your hashCode() implementation to always return 1 fore example. This is perfectly valid: Different instances (which are not equal) can have the same hashCode. But the runtime performance of looking up these objects in HashMaps, Sets or other types of collections will be very poor (because they all land in the same bucket internally - the lookup performance degrades from O(1) to O(n) because you need to traverse the list of objects in the same bucket).

Also consider taking a look at how HashMaps work in Java.

0
votes

A hash code of an object is usually much smaller than the original object. This is one purpose of the hash function. So you can imagine, that if you have n different objects (say all permutations of a class) it is not possible to code them in m (where m < n) different and smaller (than the original object) unique codes.

0
votes

Let me show with an example:

suppose that the HashCode of a string obtains as follow: hashCode = sum of each character ASCII code (but we know, real hash is more complicated)

For example : hash code of "abc" calculate in such form : 49+50+51 = 150

Then hash code of "acb" equals : 49+51+50 = 150

And so on. as you can see, there are many strings having hashcode=150 but they are not equal.