locked
distinct method suggestion RRS feed

  • General discussion

  • The distinct extension method has two overloads:

    1st - using EqualityComparer.Default

    2nd - using a comparer of IEqualityComparer

     

    the 2nd overload does not realy make sense, because you can define a comparison as a lamda expression. When using LINQ it shouldn't be necessary to create a new class which implements an interface to support special comparision. Instead you can write a lamda expression.

     

    My suggestion for a 3rd overload of Distinct:

     

    Here's the lamda

    Func<T,T,bool>

     

    The function takes two objects of the same type and returns true or false.

    You can definie equality to one or more attributes of a type

     

    For instance:

    var query = from a in adress select a;

    query.Distinct((x, y) => x.email == y.email);

     

    or

    var query = from a in adress select a;

    query.Distinct((x, y) => x.GetHashCode()== y.GetHashCode());

     

    or even

    var query = from a in adress select a;

    query.Distinct((x, y) => x.email == y.email && x.name == y.name);

     

     

    To be compatible with the current implementation of Distinct, I've written a class "FuncEqualityComparer" which can handle the lamda. I pass the lamda to the constructor. Everytime the Equals method is called, I call the lamda which is encapsulated within my class. That's it.

     

    //comparer which can handle the lamda

    public class FuncComparer<T> : System.Collections.Generic.IEqualityComparer<T>

    {

    private Func<T, T, Boolean> _comparer;

    //passing the lamda to the constructor

    public FuncComparer(Func<T, T, Boolean> comparer)

    {

    _comparer = comparer;

    }

     

    public bool Equals(T x, T y)

    {

    //calling the lamda

       return this._comparer(x, y);

    }

     

    public int GetHashCode(T obj)

    {

       return obj.GetHashCode();

    }

     

     

    And here the new Distinct method which uses the FuncEqualityComparer.

    I pass the FuncEqualityComparer with the lamda to the currently implemented Distinct method which takes an IEqualityComparer.

     

    public static IEnumerable Distinct(this IEnumerable source, Func< FONT>Boolean> comparer)

    {

    //passing the FuncComparer with the lamda to the 2nd overload of the Distinct method

        return source.Distinct<T>(new FuncComparer<T>(comparer));

    }

     

    Feedback is very welcome

     

    Benjamin Gopp

    Cologne Germany

     

     

     

     

    Thursday, May 3, 2007 4:35 PM

All replies

  • The problem you're going to run into is that when two objects compare equal they must have the same GetHashCode return value (or else the hash table used internally by Distinct will not function correctly). We use IEqualityComparer because it packages compatible implementations of Equals and GetHashCode into a single interface.

     

    Anders

     

    Thursday, May 3, 2007 10:09 PM
  • Ok, I got your point.

     

    But couldn’t you create a new method "Unique" which can take a lambda for comparison instead of GetHashCode?!

    There are certain scenarios where only one field or attribute has to be unique and not the whole row or object.

     

    Benjamin

    Saturday, May 5, 2007 8:01 AM
  • We did at one point debate having a Distinct method that takes a key selector lambda (similar to GroupBy), but we decided against it. Primarily we found it odd to have an operator that has to more or less arbitrarily pick one object when multiple objects have the same key.

     

    Actually, you can easily do what you want just using the GroupBy method:

     

    Code Snippet

    var query = contacts.GroupBy(c => c.Email).Select(g => g.First());

     

    or, using a query expression:

     

    Code Snippet

    var query =

        from c in contacts

        group c by c.Email into g

        select g.First();

     

    Note how this uses the First method to pick the first object in each group. Similar to GroupBy, you can use an anonymous type as the key if you want to check equality on multiple properties:

     

    Code Snippet

    var query =

        contacts.

        GroupBy(c => new { c.Email, c.Name }).

        Select(g => g.First());

     

    Anders

    Monday, May 7, 2007 2:11 PM
  • I was planning to ask the same question before I stumbled on this thread. I can see your reasons for not implementing the lambda, but unfortunately the groupby workaround doesn't work in all scenarios. I ran into this when trying to solve the problem given in a recent xkcd comic, namely given a menu of courses and prices, determine all combinations costing $15.05. I settled on the following:

    Code Snippet

    class Program
        {
            static Dictionary<string, int> menu = new Dictionary<string, int> { { "Mixed Fruit", 215 }, { "French Fries", 275 }, { "Side Salad", 335 }, { "Hot Wings", 355 }, { "Mozzarella Sticks", 420 }, { "Sampler Plate", 580 } };

            static void Main(string[] args)
            {
                FindCombos(new List<string>(), 1505).ForEach(i => Console.WriteLine(i.Aggregate((a, b) => a + ", " + b)));
                Console.ReadKey();
            }

            static List<List<string>> FindCombos(List<string> chosen, int remaining)
            {
                if (remaining == 0) return new List<List<string>> { chosen.OrderBy(s => s).ToList() };
                return menu.Where(m => m.Value <= remaining).SelectMany(m => FindCombos(chosen.Append(m.Key), remaining - m.Value)).ToList();
            }
        }

        public static class Extensions
        {
            public static List<T> Append<T>(this List<T> source, T item) { return source.Concat(new List<T> { item }).ToList(); }
        }



    This works quite well (thanks again for adding LINQ and lambda functions to C#), but returns a number of duplicates. These could be easily removed by adding a Distinct() after the SelectMany, were it not for the fact that the hash of a List<T> doesn't just depend on its contents. For the same reason GroupBy cannot be used. Of course it would be possible to define a new EqualityComparer that defines Equals as a.SequenceEquals(b), but to spend another eight or so lines of code to do something so trivial clashes with the elegance that LINQ makes possible.

    I would therefore like to support adding a new Extension method to IEnumerable<T> (since overloading Distinct was rejected), perhaps called Unique<T>, which is called with a lambda function as indicated in the first post. As for arbitrarily choosing objects: by specifying that when comparing two objects the first one is kept the behaviour is perfectly defined. By using this function over other methods the developer indicates he or she doesn't care which object is returned, as long as there is only object in the resulting collection that satisfies the given function. In my example, it doesn't matter if I get the first or the second list of {"Hot wings", "Side salad"}, as they are functionally identical.
    Wednesday, July 11, 2007 1:33 PM