none
"Except" so slow ? RRS feed

  • Question

  • Hello Experts

    I am trying a query like this :

     

    var

     

    diff = (compNicQuery).Select(p => p.Name).Except(nonCompQuery.Select(q => q.Name)).Count();

    this works fine for small set of data but takes long time for bigger set of data . Is there any other clause that I can use instead of "EXCEPT"...?

     

    Thanks

    Varun

    Monday, June 21, 2010 1:37 AM

Answers

  • I'm assuming this is a Linq-to-SQL query, so the following are some L2S aspects of it. If it is a linq-to-objects query, see Allan's reply above...

    ---

    A query like the one provided will be translated to something along the lines of "select count(*) from (select distinct t0.name from [compnicquery] as t0 where not exists (select ... from [noncompquery] as t1 where t0.name = t1.name and [noncompquery where clause]) where [compnicquery where clause]"

    You could try with a left join as an alternative:

    var query = (
        from q1 in compNicQuery
        from q2 in (
            from p2 in nonCompQuery
            where p2.Name == q1.Name
            select p2
            ).DefaultIfEmpty()
        where q2 == null
        select q1).Count();

    That should give you a sql query along the lines:

    select count(*)
    from [compnicquery] as t0
    left outer join (
      select 1 as test, t1.name, t1.otherfiltercolumns
      from [noncompquery] as t1
    ) as t2 on t2.name = t0.name and [noncompquery where clause]
    where t2.test is null and [compnicquery where clause]

    Whether that will make any difference on the execution plan or not is impossible to say without knowing more about the two queries involved, and the underlying db schema.

    Is the column behind "compNicQuery.Name" indexed in the database? If not, you may want to look into adding an index to it.

    That said, without knowing the underlying schema or even the two queries involved, it is very difficult to guess what might cause performance problems. Take a look at the generated SQL query, check out the execution plan for it to see what is costly.

     


    Kristofer - Huagati Systems Co., Ltd.
    Cool tools for Linq-to-SQL and Entity Framework:
    huagati.com/dbmltools (add-in with new features for the L2S and EF designers in VS2008 and VS2010)
    huagati.com/L2SProfiler (Query profiler for Linq-to-SQL and LLBLGen Pro)
    • Marked as answer by Varun_Dev Monday, June 21, 2010 6:22 AM
    Monday, June 21, 2010 5:17 AM
    Answerer

All replies

  • foreach compNicQuery row item you have to check the entire set of nonCompQuery. This has a complexity of n^2 so you cannot avoid the increase in delay as more checking is required.

    of course we can try to optimize this, perhaps try the following and see if it offers any improvement over Except():

    var nonComp = nonCompQuery.ToList();

    var compNic = compNicQuery.ToList();

    var comp = compNicQuery.Where(p=>nonCompQuery.Any(c=>c.Name == p.Name)).Count;

     

    If not you may need to thread your query .NET 4 has a set of ParellelFunctions which should be nice to improve speed with this type of problem but that depends if your using .NET 4 at the moment.

     

    Cheers,

    Allan

    Monday, June 21, 2010 2:11 AM
  • I'm assuming this is a Linq-to-SQL query, so the following are some L2S aspects of it. If it is a linq-to-objects query, see Allan's reply above...

    ---

    A query like the one provided will be translated to something along the lines of "select count(*) from (select distinct t0.name from [compnicquery] as t0 where not exists (select ... from [noncompquery] as t1 where t0.name = t1.name and [noncompquery where clause]) where [compnicquery where clause]"

    You could try with a left join as an alternative:

    var query = (
        from q1 in compNicQuery
        from q2 in (
            from p2 in nonCompQuery
            where p2.Name == q1.Name
            select p2
            ).DefaultIfEmpty()
        where q2 == null
        select q1).Count();

    That should give you a sql query along the lines:

    select count(*)
    from [compnicquery] as t0
    left outer join (
      select 1 as test, t1.name, t1.otherfiltercolumns
      from [noncompquery] as t1
    ) as t2 on t2.name = t0.name and [noncompquery where clause]
    where t2.test is null and [compnicquery where clause]

    Whether that will make any difference on the execution plan or not is impossible to say without knowing more about the two queries involved, and the underlying db schema.

    Is the column behind "compNicQuery.Name" indexed in the database? If not, you may want to look into adding an index to it.

    That said, without knowing the underlying schema or even the two queries involved, it is very difficult to guess what might cause performance problems. Take a look at the generated SQL query, check out the execution plan for it to see what is costly.

     


    Kristofer - Huagati Systems Co., Ltd.
    Cool tools for Linq-to-SQL and Entity Framework:
    huagati.com/dbmltools (add-in with new features for the L2S and EF designers in VS2008 and VS2010)
    huagati.com/L2SProfiler (Query profiler for Linq-to-SQL and LLBLGen Pro)
    • Marked as answer by Varun_Dev Monday, June 21, 2010 6:22 AM
    Monday, June 21, 2010 5:17 AM
    Answerer
  • Thanks for answering my question.

     

    I think I needed what Kristofer suggested. Left outer join. Basically , the requirement was to get everything from table 1 which do not exists in table 2 .

    And i think the syntax of LINQ that Kristofer provided does exactly that.

    Thanks

    Varun

    Monday, June 21, 2010 6:22 AM