1
votes

I'm studying for my final with questions given to us by our professor. I don't know how to answer this question:

Consider a hash-table T1 with n values where we have used f(k) = k%n for a hash-function and Chaining as our collision resolution method. Now, we want to insert these same values into a second hash-table T2 using the same hash-function, but apply Linear Probing (Single-Probing) to resolve collisions. How could you take advantage of T1 to build T2?

2
In which sense advantage? Do in you mean in terms of complexity? - Luca Marzi
I'm not sure what my professor means - Bryn

2 Answers

0
votes

Note: My answer does not contain an exact solution, only ideas.

If I understand correctly, we already have a hash-table instance T1 with n values in it, and we want to use it to build T2, instead of normally building T2 from scratch.

Since we have n values for n buckets, we know that the hash table is going to be full.

My idea #1:

I would loop through all the buckets of T1. When I find a chain of values in the m-th bucket, I can know that all these values' hash is m, without having to call the hash function. So I can insert all of these values into the m-th, (m+1)-th, (m+2)-th ... buckets in T2, or if one is occupied then I skip that bucket.

The advantage is that we never have to call the hash function.

My idea #2:

I could see which buckets contain many and which buckets contain very few elements (like 1-2) in T1. I could use this information to determine an ideal order of insertion to minimize average access time. Unfortunately, I cannot think of a concrete method to determine an ideal order.

Example:

N = 10
values = 10,20,30,40,11,32,13,35,45,19

T1 always looks like this (order within chains does not matter):

0 -> {10,20,30,40}
1 -> {11}
2 -> {32}
3 -> {13}
4 -> {}
5 -> {35,45}
6 -> {}
7 -> {}
8 -> {}
9 -> {19}

Unlike T1, T2 can vary depending on the insertion order of values. One possible T2 where accessing each element takes a few steps:

0 -> {10} 0 off
1 -> {20} 1 off
2 -> {30} 2 off
3 -> {40} 3 off
4 -> {11} 3 off
5 -> {32} 3 off
6 -> {13} 3 off
7 -> {35} 2 off
8 -> {45} 3 off
9 -> {19} 0 off

Another possible T2 when some elements can be accessed immediately, but some elements are really off:

0 -> {10} 0 off
1 -> {11} 0 off
2 -> {32} 0 off
3 -> {13} 0 off
4 -> {20} 4 off
5 -> {35} 0 off
6 -> {45} 1 off
7 -> {30} 7 off
8 -> {40} 8 off
9 -> {19} 0 off
0
votes

...hash-table T1 with n values where we have used f(k) = k%n for a hash-function and Chaining as our collision resolution method. Now, we want to insert these same values into a second hash-table T2 using the same hash-function, but apply Linear Probing (Single-Probing) to resolve collisions. How could you take advantage of T1 to build T2?

Taken exactly as asked, f(k) = k%n is an insanely stupid hash function, as:

  • we've been told "n" is the number of values

    • if that were true and n were less than the number of buckets, we'd have artificially prevented buckets after the nth being used; consequentially, collision rates would be significantly higher than necessary

    • as elements are inserted or deleted n changes, it's unthinkable that we'd rehash the table with every single value insertion or deletion

All up, it's safe to assume the professor forgot s/he'd defined "n", and wants the n in f(k) to be the number of buckets.

Let's run with that...

Because we know the hash value matches the bucket value, we know all values chained from a single bucket have hashed to that bucket's index in the bucket array. We don't need to call f(x) again to generate another hash for the same #buckets.

And that's undoubtedly what the professor's fishing for: the insight that if T2 is created with the same number of buckets as T1, then you can run a counter over the bucket indexes, copying all the values across to T2 starting at the same bucket, or if that's in use - the next bucket you can find using your linear probing. You could also keep a next-index-in-T2-that-might-be-free variable so you don't have to step through buckets that your linear probing has already been over.

So: you may be able to save yourself rehashing k, and handle collisions with less repetitive probing.