15
votes

I have the following code:

var foo = (from data in pivotedData.AsEnumerable()
                   select new
                   {
                     Group = data.Field<string>("Group_Number"),
                     Study = data.Field<string>("Study_Name")
                   }).Distinct();

As expected this returns distinct values. However, what I want is to return a strongly-typed collection as opposed to an anonymous type, so when I do:

var foo = (from data in pivotedData.AsEnumerable()
                   select new BarObject
                   {
                     Group = data.Field<string>("Group_Number"),
                     Study = data.Field<string>("Study_Name")
                   }).Distinct();

This does not return the distinct values, it returns them all. Is there a way to do this with actual objects?

7
Implement Equals() and GetHashCode() on your type.dlev
@dlev what GetHashCode should do?BrunoLM
@BrunoLM: Read for example this answer: stackoverflow.com/questions/6305324/… GetHashCode should deliver a hashcode over all fields that Equals also compares, and is used for hashtables or dictionaries for quick lookup of objects.Philip Daubmeier
@Bruno Distinct will attempt to put each object into a hash table (and will return only those that do not already exist.) That means that hash code must be implemented properly to ensure that equal items have the same hash. Otherwise, Equals() (probably) won't be called, since the objects might hash to different buckets.dlev

7 Answers

12
votes

For Distinct() (and many other LINQ features) to work, the class being compared (BarObject in your example) must implement implement Equals() and GetHashCode(), or alternatively provide a separate IEqualityComparer<T> as an argument to Distinct().

Many LINQ methods take advantage of GetHashCode() for performance because internally they will use things like a Set<T> to hold the unique items, which uses hashing for O(1) lookups. Also, GetHashCode() can quickly tell you if two objects may be equivalent and which ones are definitely not - as long as GetHashCode() is properly implemented of course.

So you should make all your classes you intend to compare in LINQ implement Equals() and GetHashCode() for completeness, or create a separate IEqualityComparer<T> implementation.

4
votes

Either do as dlev suggested or use:

var foo = (from data in pivotedData.AsEnumerable()
               select new BarObject
               {
                 Group = data.Field<string>("Group_Number"),
                 Study = data.Field<string>("Study_Name")
               }).GroupBy(x=>x.Group).Select(x=>x.FirstOrDefault())

Check this out for more info http://blog.jordanterrell.com/post/LINQ-Distinct()-does-not-work-as-expected.aspx

4
votes

You need to override Equals and GetHashCode for BarObject because the EqualityComparer.Default<BarObject> is reference equality unless you have provided overrides of Equals and GetHashCode (this is what Enumerable.Distinct<BarObject>(this IEnumerable<BarObject> source) uses). Alternatively, you can pass in an IEqualityComparer<BarObject> to Enumerable.Distinct<BarObject>(this IEnumerable<BarObject>, IEqualityComparer<BarObject>).

3
votes

Looks like Distinct can not compare your BarObject objects. Therefore it compares their references, which of course are all different from each other, even if they have the same contents.

So either you overwrite the Equals method, or you supply a custom EqualityComparer to Distinct. Remember to overwrite GetHashCode when you implement Equals, otherwise it will produce strange results if you put your objects for example into a dictionary or hashtable as key (e.g. HashSet<BarObject>). It might be (don't know exactly) that Distinct internally uses a hashset.

Here is a collection of good practices for GetHashCode.

2
votes

You want to use the other overload for Distinct() that takes a comparer. You can then implement your own IEqualityComparer<BarObject>.

1
votes

Try this:

var foo = (from data in pivotedData.AsEnumerable().Distinct()
                   select new BarObject
                   {
                     Group = data.Field<string>("Group_Number"),
                     Study = data.Field<string>("Study_Name")
                   });
-1
votes

Should be as simple as:

var foo = (from data in pivotedData.AsEnumerable()
               select new
               {
                 Group = data.Field<string>("Group_Number"),
                 Study = data.Field<string>("Study_Name")
               }).Distinct().Select(x => new BarObject {
                 Group = x.Group,
                 Study = x.Study
               });