none
Having trouble testing Generic List for Null RRS feed

  • Question

  • We are getting a "not supported" when we try the following query on "List<int> ids"

    from data in meta.customer
    where ids == null ||  ids.Contains(data.id)

    The reason this is critical is that we have found that LINQTOSQL performance when not compiling is 30x to 50x slower then the same LINQ compiled.  Bottom line, we have to compile so the query can not e generated dynamically.  I did a post on this here:  http://peterkellner.net/2009/05/06/linq-to-sql-slow-performance-compilequery-critical/


    Peter Kellner, MVP ASP.NET
    Thursday, May 7, 2009 11:06 PM

All replies

  • One problem with that would be that a non-compiled query would evaluate the "ids == null ||  ids.Contains(data.id)" portion at runtime and L2S would generate a query without a where clause if the list ids is null, and a where clause using the in operator "IN (@p1, @p2, @p3....etc" (param count depending on number of elements in the list) if not null.

    Since it is not known at compile time if the list will be null, and how many elements it will contain, generating a compiled query from a statement like the one above would be difficult. Either an inefficient 'cover everything' query and/or multiple compiled queries (one for each possible combination) would have to be generated and a matching one selected when it is executed.

    It would be interesting to take a closer look at where the time is spent in the performance test you mentioned in your blog article. For a large number of iterations/query executions I would not expect such a large difference between compiled and non-compiled query since the non-compiled execution plan should be cached by SQL Server anyway...
     
    Kristofer - Huagati Systems Co., Ltd. - www.huagati.com - Cool tools for Linq-to-SQL and Entity Framework: www.huagati.com/dbmltools
    Friday, May 8, 2009 11:49 AM
    Answerer
  • I understand what you are saying, however the larger problem I am trying to solve has other parameters in the where clause. 

    That is:

    where (ids == null ||  ids.Contains(data.id) )    &&    (!firstName == null  || firstName.equals("Kristofer") == null) && (1 ==1)

    This way, at runtime a parameters (ids,firstName) can be passed in and if either is null, they are ignored.

    The where clause never goes away in this case.
    Peter Kellner, MVP ASP.NET
    Friday, May 8, 2009 1:16 PM
  • No, but the where clause for a compiled query using contains and null exclusions is more complex and could have a negative performance impact.

    Consider the following:

    L2S where clause:

    where (ids == null ||  ids.Contains(data.id) )
      && (!firstName == null || data.firstName == firstName) 
      && (!lastName == null || data.lastName == lastName) 
    

    if pre-compiled / prepared statement it would have to cover all possible combinations of parameters. This includes the possibility of ids being null or ids containing a large number of IDs. Where to draw the limit on number of possible runtime items in the list? 10, 20, 50, 100, 1000, ...?

    where ((@P0 is null or data.id in (@P0, @P1, @P2, @P3, @P4, @Pn...m)
       and (@Px is null or data.firstName = @Px)
       and (@Py is null or data.lastName = @Py))
    

    A where clause like the one above means the SQL optimizer has to make a wild guess on the parameter values when determining an execution plan. If data.id is the primary key on the table the execution plan may very well be to do an index seek or scan on data.id. That is fine if the ids collection is passed in, but when it is not then that particular exec plan could be inefficient. (e.g. skip indexes on other indexed columns etc). "cover-all" where clauses in T-SQL is usually not something to aim for if performance is a concern...


    Now, the same where clause if not precompiling / preparing statements is evaluated at runtime and only the relevant discriminators are included in the generated T-SQL:

    1:
    List<int> ids = null;
    string firstName = "John";
    string lastName = null;
    

    ->

    where data.firstName = @P0
    ...@P0 = N'John'
    

    2:
    List<int> ids = null;
    string firstName = null;
    string lastName = "Doe";
    

    ->

    where data.lastName = @P0
    ...@P0 = N'Doe'
    

    3:
    List<int> ids = new List<int>();
    ids.Add(1);
    ids.Add(2);
    ids.Add(3);
    string firstName = null;
    string lastName = null;
    

    ->

    where data.id in (@P0, @P1, P2)
    ...@P0 = 1, @P1 = 2, @P2 = 3
    

    4:
    List<int> ids = new List<int>();
    for (int id = 0; id <= 100; id++)
    {
      ids.Add(id);
    }
    string firstName = null;
    string lastName = null;
    

    ->

    where data.id in (@P0, @P1, @P2, @P3, @P4 ... @P100)
     ... @P0 = 0, @P1 = 1, @P2 = 2, @P3 = 3, @P4 = 4, ... @P100 = 100
    In other words, compiled queries / prepared statements are really good if your parameters don't vary too much so a "one-fits-all" execution plan db-side is a good thing.

    If the [combination of] discriminators used vary a lot - even if your query can be compiled/prepared it can have a negative impact on SQL performance. In those cases you may want to 'chunk it up' and either use multiple compiled queries or not use compiled queries at all.

    After all, L2S uses parameterized queries that it executes using sp_executesql which means they will all be cached in the SQL Server stored procedure execution plan cache anyway and will be reused if the same query is executed multiple times. If the same query is executed 2000 times in a row compile times shouldn't be a concern. Instead, I would use profiling and SQL Server's I/O+time statistics to see where the big difference is in the test mentioned in your blog. I think there is something more than SQL query compile times involved...
    Kristofer - Huagati Systems Co., Ltd. - www.huagati.com - Cool tools for Linq-to-SQL and Entity Framework: www.huagati.com/dbmltools
    Saturday, May 9, 2009 3:18 AM
    Answerer
  • Thanks KristoferA for the thoughts.

    You are right that I need to pass general parameters into a query.  I would have 100% agreed with you up until a week ago when I started profiling compiled verses non-compiled queries.  My conclusion (based on performance hits of 35-50x for not compiling queries) is that in a busy production app, 90% of my queries have to be compiled.  I documented it on my blog here:  http://peterkellner.net/2009/05/06/linq-to-sql-slow-performance-compilequery-critical/

    That being said, my ORM layer looks like the following (roughly)

    List<MyData> list = MyORM.Get(firstName,lastName,approved,minAge,...)

    Because I don't know ahead of time which parameters will be null, my datalayer needs to be smart about figuring that out.  Without compiling LINQ to SQL it worked great.  I can simply check each parameter for null, then add to the query if not (ex: if (query.MyAge != null) baseQuery = baseQuery.Where(a => a.MyAge == query.MyAge);)

    Doing this create query compile times of 50 to 100 miliseconds which basically meant that my server could not process very many requests and I can't afford to have hundreds of servers for my project to give reasonable performance.

    I codegen my ORM layer and sadly have done what you shop above.  I need to figure out a way to take advantage of compiled LINQ to SQL while still maintaining some flexability.  Our app is to complex to have a separate compiled LINQ statement for every combination of where clause statements.  I really hope it does not come down to that.

    I've been thinking that maybe somehow I can create a LINQ to SQL query based on the non-null parameters, then compile and save the compiled query because likely I will be calling it again.

    Any other thoughts you have I'd appreciate.  I've reached out to several people at Microsoft on the teams trying to figure out a way out of this and none have gotten back to me.  I'm feeling like this horrible "compile" time for LINQ to SQL expressions is the dirty little secret no one talks about.




    Peter Kellner, MVP ASP.NET
    Saturday, May 9, 2009 10:21 PM
  • Thanks, this is very interesting.

    I downloaded and tried your test-app both with and without data in the table. One interesting thing is that the additional time in the non-compiled query test is spent client-side, the time spent db-side is pretty much identical for all cases. This is good since preserving resources db-side is often more important than preserving resources client-side or in middle tiers where it is easier to scale out.

    However, it is also worth noting that with data in the table (250k records), and all three queries changed to use the "optional parameter" construct mentioned above (where searchVal == null || myData.FileName == searchVal) then the outcome is very different thanks to the L2S non-compiled query is optimized to generate efficient SQL client-side before it is sent to the db.

    The new timings after that change* on my notebook was:
    "sqlonly": 318773ms
    L2SNotCompiled: 48875ms
    L2SCompiled: 324904ms

    This shows how going for 100% 'static' compiled queries for complex queries can have a negative performance impact db-side. To overcome this, getting to a point where using compiled queries _and_ take advantage of the L2S's on-the-fly SQL optimization at the same time would be ideal.

    I'm thinking the best of non-compiled and compiled queries can be combined by writing an extension method to IQueryable that creates a compiled query and takes into account any parameters that can be eliminated, and that maintains a cache of compiled queries. In other words: the input Linq query will be the same but this extension could generate new compiled queries as necessary (or take them from its' own internal cache of compiled queries) depending on parameters so each compiled query is optimized as they would be if using L2S non-compiled queries...

    ---

    * = FYI - I also made some tiny change to your test app when I ran my tests on it. None of these changes affected the result in any noticeable way, but just wanted to mention the changes I made:
    a) Moved datacontext initialization for the two L2S tests inside the loop to avoid any "smart" stuff that might be happening inside the datacontext
    b) Changed to use System.Environment.TickCount for measuring duration instead of System.DateTime.Now.
    c) Changed the datareader test to get and return the connection to the pool for every iteration (to be consistent with the other two tests - since the dc was initialized without a connection, the connection will be taken from and returned to the pool for every execution)
    d) Added a "dbcc freeproccache" between each test to avoid having results for one test affected by cached execution plans from a previous test
    Kristofer - Huagati Systems Co., Ltd. - www.huagati.com - Cool tools for Linq-to-SQL and Entity Framework: www.huagati.com/dbmltools
    Sunday, May 10, 2009 3:05 PM
    Answerer
  • Hi KristoferA,

    If this were not so critical to my business right now, I'd be totally psyched to be figuring this all out.  I just confirmed your results (skipping the small changes you made below.  Upon the suggestion of Jason Mauer, I had already changed my code to use the StopWatch class in Windows.Diagnostics namespace).

    I'm trying to extend my example to write an extension method for the extra parameter.  I know it's important that the extension method get compiled also.  If you've done that as a test, please show me some code, otherwise, if I figure it out, I'll post it here also along with the updated tests.  I've added a half a dozen tests to my code so I'll post the batch later (and I'm sure blog the results along with a big thank you for putting the work into helping me here).

    More later...
    Peter Kellner, MVP ASP.NET
    Sunday, May 10, 2009 5:04 PM
  • Those benchmarks are very skewed in that they are only testing the parsing.  If you add some data to the database and change the SQL one to actually pull all the fields back in a new class, put it into a list and move the start of the performance test outside the connection creation things the figures are much closer to what you will experience in an app and closer to each other. Interestingly if you then also just perform one loop the LINQ to SQL uncompiled is the fastest on my system.

    [)amien
    Sunday, May 10, 2009 5:13 PM
    Moderator
  • Hi Damien,

    Thanks for you ideas here.  I've changed the code as you suggest and here is my result (not much changes)

    SqlOnly: 0.108
    L2SNotCompiled: 3.783
    L2SCompiled: 0.228

    it is still 35x slower and very significant to me because I'm repeating the same query often.  I'm pasting my code below in case I missed something you suggest.  As to you comment about performing one loop, I totally understand that.  If all I wanted to do was one LINQ to SQL result, the compiling would not help since that has to be done anyway.  (BTW, I'm now doing all my tests on 250,000 records as suggested by Kristofer)

     

    private static double TestDataAccessSqlOnly(int iterations)

    {

     

    const string query =

     

    @"

    SELECT

    LinqTest.Id,

    LinqTest.FileName,

    LinqTest.FileSize,

    LinqTest.UploadDate,

    LinqTest.DownloadDate,

    LinqTest.ProcessDate

    FROM

    LinqTest

    WHERE LinqTest.FileName = @FileName

    "

     

    ;

     

    Stopwatch stopWatch = new Stopwatch();

     

    const string connectionStringName = "ConsoleApp.Properties.Settings.LinqPerfConnectionString";

     

    string str =

     

    ConfigurationManager.ConnectionStrings[connectionStringName].ConnectionString;

     

    using (var conn = new SqlConnection(str))

    {

    conn.Open();

     

    using (var command = new SqlCommand(query, conn))

    {

     

    SqlParameter paramFileName = command.Parameters.Add("@FileName", SqlDbType.NVarChar);

    stopWatch.Start();

     

    for (int i = 0; i < iterations; i++)

    {

    paramFileName.Value =

    "abcde" + i;

     

    using (SqlDataReader reader = command.ExecuteReader())

    {

     

    while (reader.Read())

    {

     

    var ret1 = reader.GetValue(0) as string;

     

    var ret2= reader.GetValue(0) as string;

     

    var ret3 = reader.GetValue(0) as string;

     

    var ret4 = reader.GetValue(0) as string;

     

    var ret5 = reader.GetValue(0) as string;

     

    var ret6 = reader.GetValue(0) as string;

    }

    }

    }

    stopWatch.Stop();

    }

    }

     

    return stopWatch.Elapsed.TotalSeconds;

    }

    }




    Peter Kellner, MVP ASP.NET
    Sunday, May 10, 2009 5:37 PM
  • Hold on yet awhile. More ways of killing a cat than choking her with cream.

    In your blog article you mention that the real world scenario where this is causing you trouble is a "workflow process that is taking 6 hours to complete". So if we take a step back and look at what exactly that process does (what is the input, output, and processing in between - 'anonymized' as necessary but still covering in more detail what your real-world scenario is) there may be a different and more efficient approach to the overall problem.

    If most of the time taken to complete your processing is spent on translating LINQ expressions to SQL then maybe each unit/query is too small. Can it somehow be changed to work with larger sets of data? If it is a big batch processing thing then is there anything that can be done db-side (working with sets and the like) rather than pulling the data back to the client?

    What I am trying to say is: we have already noticed one major difference between your real-world scenario (optional discriminators vs a relatively static query) that would lead to the opposite result if the test-app-solution with compiled queries would be applied to your real-world-app flat out. So instead of optimizing the test app / test scenario it might be a better idea to base any optimizations on your real-world scenario instead of a simplified test harness..?

    That said, a compiled query cache would still be a pretty interesting thing. I'll see if I can mock up a sample on that one...
    Kristofer - Huagati Systems Co., Ltd. - www.huagati.com - Cool tools for Linq-to-SQL and Entity Framework: www.huagati.com/dbmltools
    Monday, May 11, 2009 2:48 AM
    Answerer
  •                     for (int i = 0; i < iterations; i++)

                        {

                            paramFileName.Value = "abcde" + i;

                            var newList = new List<LinqTest>(5);

                            using (SqlDataReader reader = command.ExecuteReader())

                            {

                                while (reader.Read())

                                {

                                    newList.Add(new LinqTest()

                                    {

                                        Id = reader.GetInt32(0),

                                        FileName = reader.GetString(1),

                                        FileSize = reader.GetString(2),

                                        UploadDate = reader.GetDateTime(3),

                                        DownloadDate = reader.GetDateTime(4),

                                        ProcessDate = reader.IsDBNull(5) ? (DateTime?) null : reader.GetDateTime(5)

                                    }

                                    );

                                }

                            }

                        }


    Monday, May 11, 2009 6:40 AM
    Moderator
  • I can think of lots of ways to kill a cat (though I never actually would).

    I totally get what you are saying about optimizing for the scenario you are implementing.  turns out, the workflow process is just a small part of how we use our data layer.  For me, it was a huge wake up call on our application performance.  We are building a large scale transportation/social networking transportation logistics system which will have both large workflow components and high transaction data service calls.  We are building it as a single page JavaScript app (using ExtJS) and it will have lots of data interaction calls.  I've got a lot of experience building highly scalable systems (mostly with Oracle and Java) and not as much with SqlServer and ASP.NET on the high end side.

    The 6 hours really caught my attention because for what it was doing, the time was just too long, which caused us to start the investigation.  In my career, this is the first time the data layer on the client was significant relative to the actually round trip to the database (even after optimizing the queries).

    I really appreciate you putting together a small example of using an extension method with LINQ to SQL, then compiling that code for later execution.  I've tried to figure out what you meant by that and just haven't been able to come up with the answer.  It does sound like it would help us a lot.  I found this article that has what looks like a nice example on actually caching the compiled query.  I don't really need all forms of the example since I always just pass in one parameter (which is a class with properties).

    At any rate, thanks for the help.  I spent all day yesterday doing experiments while watching the stopwatch class do it's job and enterprise manager's profiler utility.  I'm learning a lot about how Sql Server and LINQ work together (well, and sometimes not so well).


    Peter Kellner, MVP ASP.NET
    Monday, May 11, 2009 2:11 PM
  • Hi KristoferA,

    We are still struggling with this.  We decided to bite the bullet and create a microsoft MSDN issue out of it.  Basically, we think if we can figure out how to compile what I have below in Pseudo code, our solution is clear.  That is, Hash the query object we pass in, for the query dynamically, then compile it and save it for future use (cache'd).  Below, I've listed the pseudo code we are hoping to find.  So far, microsoft is going on 3 days and 2 escalations.  Hopefully, we'll get a good answer and post it. Otherwise, not sure what I will do.

    Here is what we are trying to figure out:

                IQueryable<LinqTest> query = from myData in meta.LinqTests
                                             orderby myData.Id
                                             select myData;

                if (!String.IsNullOrEmpty(fileNameToSearch))
                {
                    query = query.Where(data => data.FileName.Equals(fileNameToSearch));
                }

                // We need to compile the Query HERE and store it in Cache
                // We know there is an expression tree associated with query.  We want to compile
                // that expression tree.
                // The below doesn't work, but that is what we want.
                Func<DataClassesDataContext, string, IQueryable<LinqTest>> compiledQuery =
                   CompiledQuery.Compile((DataClassesDataContext meta, string fileNameForSearch) =>
                                         ( query  ));
    Peter Kellner, MVP ASP.NET
    Wednesday, May 13, 2009 12:51 AM