For the sake of performance, which one do you prefer? RRS feed

  • Question

  • Hello everybody,

    I have a few questions:

    1) which line of code will have better performance: (I want to retrieve the newer bill of a subscriber, ID is primary key of bigint data type and RegistrationDate is datetime NOT NULL). Ignoring the fact that selection on indexed fields is faster, which one do you prefer? Max or OrderBy

        a) var lastBill = context.subscriber.Bills.OrderByDescending(b => b.RegistrationDate).First();

        b) var lastBill = context.subscriber.Bills.Single(b => b.ID == subscriber.Bills.Max(s => s.ID));

    2) I want to select a list of subscribers by their IDs, sometimes the list contains about 3000 or even more IDs, and my LINQ lambda code is something like:

      var subscriberList = context.subscribers.Where(s => myList.Contains(s.ID));

     as you know it will generate a SQL query using IN, but it is really time-consuming. Is there any other workaround here to be able to have all the rows selected using a single query string but better in performance?

    Of course there is another problem toward using Contains when the list contains a lot of IDs, and the problem is the maximum RPC which is 2500 by default and forces me to split my list and make multiple queries and then concatenate the results to a single output list.


    Thanks in advance

    learn to learn
    Wednesday, April 7, 2010 8:25 AM


  • 1) (b) is going to be more efficient since it only has to touch the [clustered?] PK. Probably just a single page hit. If registrationdate has a descending index, there won't be a huge difference since in the end you're just getting a single record, but there will be a difference in I/O since it has to hit both the index on registration date and then do a PK lookup on the table itself. If registrationdate is indexed in ascending order, the I/O cost will be a little bit higher. If registrationdate is not indexed at all the difference will be gianormous.

    2) For large list lookups, it can be more efficient to insert the list of IDs in another table and then join the two tables together. E.g. a 'permanent temp table' along the lines of "create table subscriber_lookup (lookup_id guid not null, subscriber_id bigint not null, constraint pk_subscriber_lookup primary key (lookup_id, subscriber_id));" where you can insert your chunk, do a lookup with a join. Of course, has a bit of write overhead so profile it to ensure it doesn't add too much overhead...

    Or - as you say - chunk the list up into small chunks of maybe 100 items at a time or so. Large "in" clauses have a tendency to result in bad execution plans. That way you can start your client-side processing while the DB lookups are going on. To make that easy, you could expose the chunks through an IEnumerable<IEnumerable<T>>, returning a chunk at a time...

    That said: profiling is always the best way to confirm which choice is best for a given situation:

    Kristofer - Huagati Systems Co., Ltd.
    Cool tools for Linq-to-SQL and Entity Framework: (add-in with new features for the L2S and EF designers in VS2008 and VS2010) (Query profiler for Linq-to-SQL and LLBLGen Pro)
    • Marked as answer by FarzanCool Wednesday, April 7, 2010 9:07 AM
    Wednesday, April 7, 2010 8:48 AM