4
votes

I know that Java has beautiful inbuilt support for the HashMaps or HashTables.

Does anybody have knowledge that what kind of hashing functions or techniques are employed by Java language?

Is it possible to tweak those functions to be able to make them more specific to one's application in order to improve performance and reducing access time?

Thanks a lot for reading!

8

8 Answers

11
votes

Java allows you to override the hashCode() method for your Classes to use a hashing algorithm that is not only well suited to your application, but to your individual types:

public class Employee {

   private int id;
   // Default implementation might want to use "name" for as part of hashCode
   private String name; 

   @Override
   public int hashCode() {
     // We know that ID is always unique, so don't use name in calculating 
     // the hash code.
     return id;
   }
}
4
votes

Go nuts.

http://www.docjar.com/html/api/java/util/HashMap.java.html

Furthermore, you can always set the resize threshold and initial memory usage to be as large as you'll need it to be, which will decrease put time when the map is almost full. If your map is threaded, you'll gain a huge performance boost by using ConcurrentHashmap, also.

4
votes

Just as a note, if you are going to override hashCode you should also override equals.

3
votes

The hashcode is computed per object stored in the collection. It is computed using a standard algorithm (according to Effective Java). See that for more details.

You can indeed override the hashcode method on a per object basis. The best way to implement a hashcode method is via the HashcodeBuilder (whcih is part of the Commons Lang framework, see here:

http://commons.apache.org/lang/

Fore more gory details on hashcode see this article:

http://www.ibm.com/developerworks/java/library/j-jtp05273.html

Hope that helps.

1
votes

I know that Java has beautiful inbuilt support for the HashMaps or HashTables.

Totally lacking a syntax for hash map literals, I would not really say that ...

Anyway, as others pointed out, it is up to the individual classes to specify what their hashCode() should be (the default is a hash of the memory address). If you do implement your own, make sure you follow the contract of the hashCode() method (in particular it needs to be consistent with equals()), otherwise the class will not work for keys in a HashMap.

You can also look at the source code to java.util.HashMap and friends directly and see how they are implemented. HashMap for example uses an Array of buckets, and the buckets can overflow using a linked list.

For further reading, you might want to look at the ConcurrentHashMap, which can be safely accessed by many threads at the same time, and at TreeMap, which offers a way to build a map for keys that can be ordered (and not necessarily hashed).

1
votes

In general, it's not worth worrying too much about the hash functions of the standard JDK classes. Even if you could override String (you can't), in practice, it's hash function is practically always "good enough". There are maybe a few exceptions-- e.g. certain classes such as BigInteger and collections calculate their hash code every time by cycling through every single element they contain, which is pretty spurious in some cases-- but how often do you key on instances of those classes?

For designing hash codes for your own classes, the thing you're trying to do is spread hash codes "randomly" over the range of integers. To do this, you generally want to "mix" the bits of successive fields in your object (you may be interested in an article on my web site that graphically illustrates how the String hash code mixes bits). Multiplying the current hash by an odd number (and generally a prime number) then adding in the hash of the next element generally works sufficiently well as a first attempt. (However, problems can occur with this method when, for example, the numbers/hash codes being combined tend to have zeroes in their lower bits-- there's generally no practical hash function that's absolutely guaranteed to work well in all cases.)

Then, you can consider testing your hash code. Generate a series of random objects (or even use some real ones), calculate their hash codes, AND off the bottom, say, 16 bits of the hash codes, and then see how many collisions you get. Check that the number of collisions you get roughly matches the number of hash collisions you'd expect to get by chance. For example, if you AND off the bottom 16 bits of the hash code (& 0xffff) then after 1000 random objects, you'd expect about 8 collisions. After 2000, you'd expect about 30 collisions.

As far as performance is concerned, then up to some point, I think that getting a hash code that's well distributed will generally be more beneficial nowadays than sacrificing hash quality for hash calculation speed.

1
votes

There is a "hashCode/equals contract" you should adhere to which says that objects which are equal to each other according to the equals() method must provide the same hashCode() value. It is not required however that all objects with the same hashCode are also equal. You should have a look at http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode() which tells you the details.

It can be a little hard to wrap your head around the symmetries involved at first, but it is definitely worth to understand it unless you are eager to have strange behavior in your app when you put objects into HashMap and friends that do not adhere to that contract.

I also recommend to get hold of copy of Effective Java and read the chapters on hashCode/equals to fully understand it.

0
votes

what i suggest, if you know you need fast hashes, is to use another implementation: try fast util (http://fastutil.dsi.unimi.it/ ) or trove (http://trove4j.sourceforge.net/ ). They are apparently faster, but is type specific.