6
votes

When I compile the following code, I saw the errors that are related to Hash.

int F_no_meaningA(unordered_set<vector<int>>& setVec, vector<int>& vec) 
{
    setVec.insert(vec);
    return 1;
}

int main()
{
  vector<int> W{2, 3, 7}; 
  unordered_set<vector<int>> setVec; 
}

$ g++ --version
g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

$ g++ $1.cpp -o $1 -g -Wall -Weffc++ -pedantic -std=c++0x

/tmp/ccCQFQ4N.o: In function `std::__detail::_Hash_code_base

, std::vector >, std::_Identity > >, std::equal_to > >, std::hash > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, false>::_M_hash_code(std::vector > const&) const': /usr/include/c++/4.6/bits/hashtable_policy.h:753: undefined reference to std::hash<std::vector<int, std::allocator<int> > ::operator()(std::vector<int, std::allocator<int> >) const' /tmp/ccCQFQ4N.o: In function std::__detail::_Hash_code_base , std::vector >, std::_Identity > >, std::equal_to > >, std::hash > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, false>::_M_bucket_index(std::__detail::_Hash_node >, false> const*, unsigned int) const': /usr/include/c++/4.6/bits/hashtable_policy.h:763: undefined reference to `std::hash > ::operator()(std::vector >) const' collect2: ld returned 1 exit status

Then, I introduce the following own Hash and the problem is solved.

Question 1> When should we provide our own Hash for std::unordered_set? When should we provide our own Equivalent Function for std::unordered_set?

struct HashVector : unary_function<vector<int>, vector<int>::size_type> {
  vector<int>::size_type operator()(const vector<int>& vec) const {
    vector<int>::size_type sum = 0;
    for(int i : vec) {
      sum = sum*37 + hash<int>()(i);
    }
    return sum;
  }
};

int F_no_meaningB(unordered_set<vector<int>, HashVector>& setVec, vector<int>& vec) 
{
    setVec.insert(vec);
    return 1;
}

int main()
{
  vector<int> W{2, 3, 7}; 
  unordered_set<vector<int>, HashVector> setVec; 
}

warning: base class ‘struct std::unary_function, unsigned int>’ has a non-virtual destructor [-Weffc++]

Question 2> Why the g++ complain about the struct HashVector with the above warning?

Thank you

2
Because there's no builtin hash for vector<int>?Bartek Banachewicz
So you mean the compiler is not smarter enough to use std::hash<int> to iteratively go through all elements in vector?q0987
It's not a matter of "smart" or "not smart". Your solution is a solution, not the solution for hashing containers. You might as well want any other behaviour. BTW, check out Boost.Hash and hash_combine and hash_range.Bartek Banachewicz
@q0987 You need to provide your own hash function whenever the type stored in the container is not one which the standard provides a specialization for. If you can use Boost, it provides hash_range which will iterate over a container and generate a hash.Praetorian

2 Answers

6
votes

When should we provide our own Hash for std::unordered_set?

When you're using a type that doesn't have a hash provided by the standard library. For example, it doesn't provide hash functions for the standard containers, including vector<int>.

Why the g++ complain about the struct HashVector with the above warning?

Because you've used -Weffc++ to request a (slightly over-zealous) warning to tell you whenever you inherit from a class with no virtual destructor. For most uses of inheritance (i.e. for polymorphism), you don't want to do that. However, in this case, inheritance is just used (or, some might say, abused) to inject some definitions into the class, so the warning does not indicate a problem.

Classes like std::unary_function are deprecated, so the best solution is not to inherit from it at all.

5
votes

When should we provide our own Hash for std::unordered_set?

The standard only requires a limited number of specializations, mostly for primitive types. This is because these primitive types have some reasonable default "one-size-fits-all" hash functions that the implementation can provide. More complicated types, such as custom types or containers, do not have an obvious or even reasonable default hashing, and thus, you are required to provide your own. If your value-type is not supported, you must provide a hash function implementation for it.

Also, another reason for providing your own hash function is when you have some additional expert-knowledge about the distribution of values in your unordered_set. The performance of a hash-table is very much sensitive to how appropriate the hash-function is with respect to the distribution of the values stored in the table. Here's a more complete explanation. The standard defaults are just a one-size-fits-all solution, meaning it's easy and convenient, but almost always sub-optimal.

Why the g++ complain about the struct HashVector with the above warning?

This is mostly a matter of applying warnings that are mostly relevant to classic object-oriented programming (using base classes as dynamically polymorphic interfaces to derived classes). In that kind of context, it is a pretty serious mistake to not define the destructor to be virtual (which allows for the correct destruction of the derived class from a base class instance (e.g., delete base_ptr;). As Mike suggests, this is because -Weffc++ is enabled (which mostly applies novice-level classic OOP-style warning messages). In your code, however, the inheritance is used in the context of generic programming, where inheritance is used in a very different manner (mostly to imbue a class with some ground-works properties and traits). In this case, it is not a problem that the base class does not have a virtual destructor, because it is not intended to be used in a dynamically polymorphic setting, but rather in a statically polymorphic setting.

Also note that std::unary_function (and its relatives) have been deprecated in the latest standard (C++11). This is because of the enhancements to type introspection provided by the latest standard (with <type_traits>, decltype and type inference).