12
votes

Without doing anything special for a reference type, Equals() would mean reference equality (i.e. same object). If I choose to override Equals() for a reference type, should it always mean that the values of the two objects are equivalent?

Consider this mutable Person class:

class Person
{
    readonly int Id;

    string FirstName { get; set; }
    string LastName { get; set; }
    string Address { get; set; }
    // ...
}

Two objects that represent the exact same person will always have the same Id, but the other fields might be different over time (i.e. before/after an address change).

For this object Equals could be defined to mean different things:

  • Value Equality: all fields are equal (two objects representing the same person but with different addresses would return false)
  • Identity Equality: the Ids are equal (two objects representing the same person but with different addresses would return true)
  • Reference Equality: i.e. don't implement Equals.

Question: Which (if any) of these is preferable for this class? (Or perhaps the question should be, "how would most clients of this class expect Equals() to behave?")

Notes:

  • Using Value Equality makes it more difficult to use this class in a Hashset or Dictionary
  • Using Identity Equality makes the relationship between Equals and the = operator strange (i.e. after a check of two Person objects (p1 and p2) returns true for Equals(), you might still want to update your reference to point to the "newer" Person object since it is not value equivalent). For example, the following code reads strange--seems like it does nothing, but it is actually removing p1 and adding p2:

    HashSet<Person> people = new HashSet<Person>();
    people.Add(p1);
    // ... p2 is an new object that has the same Id as p1 but different Address
    people.Remove(p2);
    people.Add(p2);
    

Related Questions:

3
In the case of multiple meanings of equality then why not provide some implementations of IEqualityComparer(T) and let the consumer choose which equality to use?Dustin Kingen
@Romoku, seems reasonable, and then I would assume that means you would not implement .Equals() at all?Matt Smith
I think Marc Gravell's answer addresses that.Dustin Kingen
If two objects that represent the exact same person will always have the same Id, then I would expect Equals to compare only the Id fields. Anything else wouldn't make sense for the Equals method, and would be potentially dangerous if you're comparing mutable fields.Jim Mischel
@JimMischel, "Anything else wouldn't make sense for the Equals", why not? I've given 3 meanings of what Equals could mean (and each of those meanings could be useful for a client).Matt Smith

3 Answers

13
votes

Yes, deciding the right rules for this is tricky. There is no single "right" answer here, and it will depend a lot on both context and preference Personally, I rarely bother thinking about it much, just defaulting to reference equality on most regular POCO classes:

  • the number of cases when you use something like Person as a dictionary-key / in a hash-set is minimal
    • and when you do, you can provide a custom comparer that follows the actual rules you want it to follow
    • but most of the time, I'd use simply the int Id as the key in a dictionary (etc) anyway
  • using reference equality means that x==y gives the same result whether x/y are Person or object, or indeed T in a generic method
  • as long as Equals and GetHashCode are compatible, most things will just about work out, and one easy way to do that is to not override them

Note, however, that I would always advise the opposite for value-types, i.e. explicitly override Equals / GetHashCode; but then, writing a struct is really uncommon

6
votes

You could provide multiple IEqualityComparer(T) implementations and let the consumer decide.

Example:

// Leave the class Equals as reference equality
class Person
{
    readonly int Id;

    string FirstName { get; set; }
    string LastName { get; set; }
    string Address { get; set; }
    // ...
}

class PersonIdentityEqualityComparer : IEqualityComparer<Person>
{
    public bool Equals(Person p1, Person p2)
    {
        if(p1 == null || p2 == null) return false;

        return p1.Id == p2.Id;
    }

    public int GetHashCode(Person p)
    {
        return p.Id.GetHashCode();
    }
}

class PersonValueEqualityComparer : IEqualityComparer<Person>
{
    public bool Equals(Person p1, Person p2)
    {
        if(p1 == null || p2 == null) return false;

        return p1.Id == p2.Id &&
               p1.FirstName == p2.FirstName; // etc
    }

    public int GetHashCode(Person p)
    {
        int hash = 17;

        hash = hash * 23 + p.Id.GetHashCode();
        hash = hash * 23 + p.FirstName.GetHashCode();
        // etc

        return hash;
    }
}

See also: What is the best algorithm for an overridden System.Object.GetHashCode?

Usage:

var personIdentityComparer = new PersonIdentityEqualityComparer();
var personValueComparer = new PersonValueEqualityComparer();

var joseph = new Person { Id = 1, FirstName = "Joseph" }

var persons = new List<Person>
{
   new Person { Id = 1, FirstName = "Joe" },
   new Person { Id = 2, FirstName = "Mary" },
   joseph
};

var personsIdentity = new HashSet<Person>(persons, personIdentityComparer);
var personsValue = new HashSet<Person>(persons, personValueComparer);

var containsJoseph = personsIdentity.Contains(joseph);
Console.WriteLine(containsJoseph); // false;

containsJoseph = personsValue.Contains(joseph);
Console.WriteLine(containsJoseph); // true;
1
votes

Fundamentally, if class-type fields (or variables, array slots, etc.) X and Y each hold a reference to a class object, there are two logical questions that (Object)X.Equals(Y) can answer:

  1. If the reference in `Y` were copied to `X` (meaning the reference is copied), would the class have any reason to expect such a change to affect program semantics in any way (e.g. by affecting the present *or future* behavior of any members of `X` or `Y`)
  2. If *all* references to the target of `X` were instantaneously magically made to point to the target of `Y`, *and vice versa*`, should the class expect such a change to alter program behavior (e.g. by altering behavior of any member *other than an identity-based `GetHashCode`*, or by causing a storage location to refer to an object of incompatible type).

Note that if X and Y refer to objects of different types, neither function may legitimately return true unless both classes know that there cannot be any storage locations holding a reference to one which could not also hold a reference to the other [e.g. because both types are private classes derived from a common base, and neither is ever stored in any storage location (other than this) whose type can't hold references to both].

The default Object.Equals method answers the first question; ValueType.Equals answers the second. The first question is generally the appropriate one to ask of object instances whose observable state may be mutated; the second is appropriate to ask of object instances whose observable state will not be mutated even if their types would allow it. If X and Y each hold a reference to a distinct int[1], and both arrays hold 23 in their first element, the first equality relation should define them as distinct [copying X to Y would alter the behavior of X[0] if Y[0] were modified], but the second should regard them as equivalent (swapping all references to the targets of X and Y wouldn't affect anything). Note that if the arrays held different values, the second test should regard the arrays as distinct, since swapping the objects would mean X[0] would now report the value that Y[0] used to report).

There's a pretty strong convention that mutable types (other than System.ValueType and its descendants) should override Object.Equals to implement the first type of equivalence relation; since it's impossible for System.ValueType or its descendants to implement the first relation, they generally implement the second. Unfortunately, there's no standard convention by which objects which override Object.Equals() for the first kind of relation should expose a method which tests for the second, even though an equivalence relation could be defined which allowed comparison between any two objects of any arbitrary type. The second relation would be useful in the standard pattern wherein an immutable class Imm holds a private reference to a mutable type Mut but doesn't expose that object to any code that could actually mutate it [making the instance immutable]. In such a case, there's no way for class Mut to know that an instance will never be written, but it would be helpful to have a standard means by which two instances of Imm could ask the Muts to which they hold references whether they would be equivalent if the holders of the references never mutated them. Note that the equivalence relation defined above makes no reference to mutation, nor to any particular means which Imm must use to ensure that an instance won't be mutated, but its meaning is well-defined in any case. The object which holds a reference to Mut should know whether that reference encapsulates identity, mutable state, or immutable state, and should thus be able to implement its own equality relation suitably.