none
How to do nested group by efficiently? RRS feed

  • Question

  • Hi,

    I have a problem trying to figure out how to make grouping work without it sending a tonne of inidividual queries to the database server as it currently is.

    I have a schema like: site <-> lab <-> computer, each association is 1-M. And I want to produce a grouped result set such that it is grouped by site name, then lab name, giving a tree structure.

    E.g.
    Site 1
    - Lab 1_1
    - - Computer 1_1_1
    - - Computer 1_1_2
    - Lab 1_2
    - - Computer 1_2_1
    - - Computer 1_2_2
    Site 2
    - Lab 2_1
    - - Computer 2_1_1
    - - Computer 2_1_2

    But LINQ to SQL only fetches the site names in the first query. Then it fetches the labs for each site in separate queries. Then fetches the computers for each lab as further queries. All of which is obviously very inefficient...

    So, how to I write a query that would do this properly?

    Thanks,

    Jessica
    Monday, June 15, 2009 1:55 AM

Answers

  • Hi Jessica

    There's no ideal solution here. The root cause of the problem is that LINQ to SQL works by converting LINQ to SQL, and there's no way to express such a query efficiently in SQL. So it's always going to be a compromise. LINQ to SQL compromises by round-tripping; Entity Framework compromises by fetching more data than is needed (using some big outer joins).

    With LINQ to SQL, you can reduce the amount of round-tripping to just one level as follows:

    var options = new DataLoadOptions();
    options.LoadWith<Site>(s => s.Labs)
    options.LoadWith<Lab>(l => l.Computers);
    this.LoadOptions = options;

    This tells LINQ to SQL that you want to fetch labs with sites, and computers with labs. It will do the best it can, which is to round-trip less (but it will still round-trip).

    The other option is to fetch data with just one level of nesting (which LINQ to SQL can translate into a single SQL statement) and then join/group back on the client.

    Joe



    Write LINQ queries interactively - www.linqpad.net
    • Marked as answer by jessic4h Thursday, June 18, 2009 10:04 PM
    Monday, June 15, 2009 11:19 AM
    Answerer

All replies

  • Hi Jessica

    There's no ideal solution here. The root cause of the problem is that LINQ to SQL works by converting LINQ to SQL, and there's no way to express such a query efficiently in SQL. So it's always going to be a compromise. LINQ to SQL compromises by round-tripping; Entity Framework compromises by fetching more data than is needed (using some big outer joins).

    With LINQ to SQL, you can reduce the amount of round-tripping to just one level as follows:

    var options = new DataLoadOptions();
    options.LoadWith<Site>(s => s.Labs)
    options.LoadWith<Lab>(l => l.Computers);
    this.LoadOptions = options;

    This tells LINQ to SQL that you want to fetch labs with sites, and computers with labs. It will do the best it can, which is to round-trip less (but it will still round-trip).

    The other option is to fetch data with just one level of nesting (which LINQ to SQL can translate into a single SQL statement) and then join/group back on the client.

    Joe



    Write LINQ queries interactively - www.linqpad.net
    • Marked as answer by jessic4h Thursday, June 18, 2009 10:04 PM
    Monday, June 15, 2009 11:19 AM
    Answerer
  • Thanks, doing a join of the labs and sites to the computers and the dataoptions thing, then doing groupby on the client side does work. Just the one query now, which is certainly a lot better than a hundred queries previously going out over the wire...
    Thursday, June 18, 2009 10:04 PM