This might be a silly question, but here goes :
I hashed a dictionary of words into an unordered_set based hash-table. My hash function was made intentionally "bad", in that all strings that contained the same set of letters would hash to the same value. I initially tried to over-ride the normal hash function behaviour, and use a "frequency histogram" of the letters in each word as a hash value (which I learnt was impossible :) ), but one of the threads suggested using a 26-bit bitmask to achieve the same. The hash function works fine and dandy thus far.
For example, in my scheme, CITIED and CITED hash to the same value, 1049144. My idea was that given a set of letters, I wanted to find all the words containing letters from that set.
I am guessing that I haven't quite understood the concept of hashing (or my code is plain wrong), as I can't quite explain the behaviour I encountered :
I decided to look for all words which consisted of letters from the string "LIVEN".
My output (with hash key) was as follows :
VENVILLE,4215328
LEVIN,4215328
ENLIVEN,4215328
CURTSEYED,37486648
How on earth did CURTSEYED land up there? As can be seen, it has a different hash value from the remaining three words. Where does the fault lie with my understanding/implementation of the hash table?
Code that produced above output:
typedef std::unordered_set< std::string, my_string_hash_function, my_string_equality> Dict
DictHash dict;
DictHash::const_local_iterator c_l_itr;
DictHash::size_type bs = dict.bucket (std::string ("LIVEN"));
for (c_l_itr = dict.begin(bs); c_l_itr != dict.end(bs); c_l_itr++)
std::cout
My hash function :
struct my_string_hash_function
{
std::size_t operator()(const std::string& s) const
{
unsigned long hash = 0;
std::string::const_iterator itr;
for (itr = s.begin(); itr != s.end(); itr++)
hash |= 2 << (*itr - int('A'));
return hash;
}
};
Comparison function :
struct my_string_equality { bool operator()(const std::string& s1, const std::string& s2) const { if (s1.length() != s2.length()) return false; unsigned int hash1 = 0, hash2 = 0; const char *str1, *str2; int i,len; len = s1.length(); str1 = s1.c_str(); str2 = s2.c_str(); for (i = 0; i < len; i++) { hash1 |= 2 << (str1[i] - (int)'A'); hash2 |= 2 << (str2[i] - (int)'A'); } return hash1 == hash2; } };