Implementing a hash table

Question

I started reading about implementing various data structures a few days back, and got around to hash tables and got stuck on a specific point.

My understanding of how a hash table is implemented: A key K is passed to a hash function H which returns a hashed version of K, HK. HK should probably be at least a uint32_t to account for collisions, we have an array of size X, the item is stored at the index HK of this array.. but, wouldn't this require a pre-allocated array of length uint32_t atleast (or whatever the return value of H is)? assuming that we don't store the data itself within that array, and instead store a ptr to the data, then we'd need an array of ptr_t of length uint32_t.. this seems quite wasteful, on 64bit that would mean memory usage of: 2^32 * 8 = 34359738368 bytes or ~32GB just for the array of ptrs to the data, which is obviously not how its actually implemented in real life..

So what am I missing?

I think the typical implementation is not using an array but a linked list. — Stephan Dollberg
I think the typical implementation is not using a linked list but an array. — max taldykin
About collisions, when using a hashtable, there will be collsions. They should be handled, not avoided. They can be minimized by decent hashing and dimension. — stefaanv
@bamboon: actually, you use an array of linked-lists for the run-of-the-mill implementation. It's not ideal in case of collision and has poor locality of reference (between items), but the complexity of operations is quite predictable. — Matthieu M.
@MatthieuM. Well yeah, that was a missunderstanding, I thought he was talking about the second stage. — Stephan Dollberg

David Schwartz David Schwartz · Accepted Answer · 2012-01-20T06:51:45

It depends upon the implementation. There are three basic ways this is done:

1) Small hashes are used. So instead of using a 32-bit hash, say, an 8-bit hash is used.

2) Multiple levels of hashing are used. So, for example, a 12-bit hash may determine which "bucket" an entry goes in, but a collision only occurs if the full 32-bit hash matches. Each bucket is stored in a linked list or similar structure. (Perhaps one optimized for searching for the full 32-bit hash within it.)

3) Sparse arrays are used. These are data structures that don't need to actually store blank spaces for unfilled slots. (In practice, it could be something entirely different such as a tree, but it acts like a sparse array with efficient searching.)

Implementing a hash table

4 Answers