15
votes

I simply want to remove duplicates from two lists and combine them into one list. I also need to be able to define what a duplicate is. I define a duplicate by the ColumnIndex property, if they are the same, they are duplicates. Here is the approach I took:

I found a nifty example of how to write inline comparers for the random occassions where you need em only once in a code segment.

public class InlineComparer<T> : IEqualityComparer<T>
{
    private readonly Func<T, T, bool> getEquals;
    private readonly Func<T, int> getHashCode;

    public InlineComparer(Func<T, T, bool> equals, Func<T, int> hashCode)
    {
        getEquals = equals;
        getHashCode = hashCode;
    }

    public bool Equals(T x, T y)
    {
        return getEquals(x, y);
    }

    public int GetHashCode(T obj)
    {
        return getHashCode(obj);
    }
}

Then I just have my two lists, and attempt a union on them with the comparer.

            var formatIssues = issues.Where(i => i.IsFormatError == true);
            var groupIssues = issues.Where(i => i.IsGroupError == true);

            var dupComparer = new InlineComparer<Issue>((i1, i2) => i1.ColumnInfo.ColumnIndex == i2.ColumnInfo.ColumnIndex, 
            i => i.ColumnInfo.ColumnIndex);

            var filteredIssues = groupIssues.Union(formatIssues, dupComparer);

The result set however is null.

Where am I going astray? I have already confirmed that the two lists have columns with equal ColumnIndex properties.

3
Just to get some background on this problem, have you tried debugging the code and are certain that the public bool Equals(T x, T y) method is being called and not the public int GetHashCode(T obj) method?avanek
The result is really null, not an empty sequence? That would be really odd, since Enumerable.Union() should never return null.svick

3 Answers

17
votes

I've just run your code on a test set.... and it works!

    public class InlineComparer<T> : IEqualityComparer<T>
    {
        private readonly Func<T, T, bool> getEquals;
        private readonly Func<T, int> getHashCode;

        public InlineComparer(Func<T, T, bool> equals, Func<T, int> hashCode)
        {
            getEquals = equals;
            getHashCode = hashCode;
        }

        public bool Equals(T x, T y)
        {
            return getEquals(x, y);
        }

        public int GetHashCode(T obj)
        {
            return getHashCode(obj);
        }
    }

    class TestClass
    {
        public string S { get; set; }
    }

    [TestMethod]
    public void testThis()
    {
        var l1 = new List<TestClass>()
                     {
                         new TestClass() {S = "one"},
                         new TestClass() {S = "two"},
                     };
        var l2 = new List<TestClass>()
                     {
                         new TestClass() {S = "three"},
                         new TestClass() {S = "two"},
                     };

        var dupComparer = new InlineComparer<TestClass>((i1, i2) => i1.S == i2.S, i => i.S.GetHashCode());

        var unionList = l1.Union(l2, dupComparer);

        Assert.AreEqual(3, unionList);
    }

So... maybe go back and check your test data - or run it with some other test data?

After all - for a Union to be empty - that suggests that both your input lists are also empty?

7
votes

A slightly simpler way:

  • it does preserve the original order
  • it ignores dupes as it finds them

Uses a link extension method:

   formatIssues.Union(groupIssues).DistinctBy(x => x.ColumnIndex)

This is the DistinctBy lambda method from MoreLinq

public static IEnumerable<TSource> DistinctBy<TSource, TKey>
     (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    HashSet<TKey> knownKeys = new HashSet<TKey>();
    foreach (TSource element in source)
    {
        if (knownKeys.Add(keySelector(element)))
        {
            yield return element;
        }
    }
}
3
votes

Would the Linq Except method not do it for you?

var formatIssues = issues.Where(i => i.IsFormatError == true);
var groupIssues = issues.Where(i => i.IsGroupError == true);

var dupeIssues = issues.Where(i => issues.Except(new List<Issue> {i})
                                        .Any(x => x.ColumnIndex == i.ColumnIndex));

var filteredIssues = formatIssues.Union(groupIssues).Except(dupeIssues);