none
Calculating max (date) value across multiple columns -> SQL Query works -> How to convert it? RRS feed

  • Question

  • I have several tables with a "changed timestamp". In my query I join these tables and want to show just one"change timestamp". The shown value should be the maximum over the different columns in the joined tables.

    In SQL I know there are at least two ways of doing that:
    1. Using the CASE operator
    2. Using a UNION subselect construct in the select part
    Option 1 gets too complicated when there are more than 3 columns to compare. So I end up with this properly working SQL statement (simplified scenario):

    1select  
    2    a.id, 
    3    ( 
    4        select max(dateColsInDifferentTables.dateval) from  
    5        ( 
    6            Select a.dateA as dateval from a 
    7            union all 
    8            Select b.dateB as dateval from b 
    9            union all 
    10            Select c.dateC as dateval from c 
    11            union all 
    12            Select d.dateD as dateval from d 
    13        ) as dateColsInDifferentTables 
    14    )as maxdate 
    15from  
    16    a 
    17    inner join b on a.id = b.a_id 
    18    inner join c on a.id = c.a_id 
    19    inner join d on a.id = d.a_id 

    But until no I found no way to let LINQ generate this query for me. Any suggestions?

    BTW: This is the simplified scenario's DDL:

    CREATE TABLE A (  
        Id int NOT NULL
        DateA datetime NULL 
    ); 
    CREATE TABLE B (  
        A_Id int NOT NULL
        DateB datetime NULL 
    ); 
    CREATE TABLE C (  
        A_Id int NULL
        DateC datetime NULL 
    ); 
    CREATE TABLE D (  
        A_Id int NOT NULL
        DateD datetime NULL 
    ); 
     

    Monday, February 9, 2009 8:02 PM

Answers

  • First of all: Shame on me, because my SQL Query didn't really do what I wanted. I didn't paste the correct version. The correlations to the outer query in the union subselects were missing. So did they in Daniels proposals. Sorry for that!

    Here is one solution (LINQ to SQL and SQL) to that problem:

    Plain SQL

        select   
             a.id 
            ,b.a_id 
            ,c.a_id 
            ,d.a_id 
            ,a.dateA 
            ,b.dateB 
            ,c.dateC 
            ,d.dateD  
            ,(  
                select max(dateColsInDifferentTables.dateval) from   
                (  
                    Select a2.dateA as dateval from a a2 where a2.Id = a.Id 
                    union all  
                    Select b2.dateB as dateval from b b2 where b2.A_Id = a.Id 
                    union all  
                    Select c2.dateC as dateval from c c2 where c2.A_Id = a.Id 
                    union all  
                    Select d2.dateD as dateval from d d2 where d2.A_Id = a.Id 
                ) as dateColsInDifferentTables  
            )as maxdate 
        from   
            a  
            inner join b on a.id = b.a_id  
            inner join c on a.id = c.a_id  
            inner join d on a.id = d.a_id  
     

    LINQ

                var query = 
                        from a in dc.As 
                        join b in dc.Bs on a.Id equals b.A_Id 
                        join c in dc.Cs on a.Id equals c.A_Id 
                        join d in dc.Ds on a.Id equals d.A_Id 
                        select new 
                        { 
                            A_Id = a.Id, 
                            A_Date = a.DateA, 
                            B_Date = b.DateB, 
                            C_Date = c.DateC, 
                            D_Date = d.DateD, 
                            Maxdate = (from a2 in dc.As where a2.Id == a.Id select a2.DateA) 
                                        .Concat(from b2 in dc.Bs where b2.A_Id == a.Id select b2.DateB) 
                                        .Concat(from c2 in dc.Cs where c2.A_Id == a.Id select c2.DateC) 
                                        .Concat(from d2 in dc.Ds where d2.A_Id == a.Id select d2.DateD).Max() 
                        }; 
     

    LINQ's generated SQL

    SELECT [t0].[Id] AS [A_Id], [t0].[DateA] AS [A_Date], [t1].[DateB] AS [B_Date], [t2].[DateC] AS [C_Date], [t3].[DateD] AS [D_Date], ( 
        SELECT MAX([t11].[DateA]) 
        FROM ( 
            SELECT [t10].[DateA] 
            FROM ( 
                SELECT [t8].[DateA] 
                FROM ( 
                    SELECT [t6].[DateA] 
                    FROM ( 
                        SELECT [t4].[DateA] 
                        FROM [dbo].[A] AS [t4] 
                        WHERE [t4].[Id] = [t0].[Id] 
                        UNION ALL 
                        SELECT [t5].[DateB] 
                        FROM [dbo].[B] AS [t5] 
                        WHERE [t5].[A_Id] = [t0].[Id] 
                        ) AS [t6] 
                    UNION ALL 
                    SELECT [t7].[DateC] 
                    FROM [dbo].[C] AS [t7] 
                    WHERE [t7].[A_Id] = ([t0].[Id]) 
                    ) AS [t8] 
                UNION ALL 
                SELECT [t9].[DateD] 
                FROM [dbo].[D] AS [t9] 
                WHERE [t9].[A_Id] = [t0].[Id] 
                ) AS [t10] 
            ) AS [t11] 
        ) AS [Maxdate] 
    FROM [dbo].[A] AS [t0] 
    INNER JOIN [dbo].[B] AS [t1] ON [t0].[Id] = [t1].[A_Id] 
    INNER JOIN [dbo].[C] AS [t2] ON ([t0].[Id]) = [t2].[A_Id] 
    INNER JOIN [dbo].[D] AS [t3] ON [t0].[Id] = [t3].[A_Id] 

    At first glance the generated query looks ok to me and works well woth a small amount of data but I didn't check any exec plans so far...

    Thanks for your help Daniel!


    • Marked as answer by fcavelti Wednesday, February 11, 2009 5:51 PM
    Wednesday, February 11, 2009 5:50 PM

All replies

  • One thing to note is that LINQ to SQL will treat timestamp as a Binary, so if you want to do something like the following query, you'll have to change the column to be a datetime instead, but you should be able to do something like this:
    using(var dc = new SomeDataContext()) 
        var idAndMaxDate = from a in dc.As 
                           join b in dc.Bs 
                           on b.a_id equals a.id 
                           join c in dc.Cs 
                           on c.a_id equals a.id 
                           join d in dc.Ds 
                           on d.a_id equals a.id 
                           select new {Id = a.id, MaxDate = dc.As.Select<A, DateTime>(tempA => tempA.dateA).Union(dc.Bs.Select<B, DateTime>(tempB => tempB.dateB)).Union(dc.Cs.Select<C, DateTime>(tempC => tempC.dateC)).Union(dc.Ds.Select<D, DateTime>(tempD => tempD.dateD)).Max() }; 


    Daniel - http://webs.neumont.edu/dstafford
    Tuesday, February 10, 2009 1:47 AM
  • Thanks for your quick help Daniel, but it's not the solution yet.

    I used your approach like this with two tables only:

                var query = 
                        from a in dc.As 
                        join b in dc.Bs on a.Id equals b.A_Id 
                        where a.Id == 1 
                        select new 
                        { 
                            A_Id = a.Id, 
                            A_Date = a.DateA, 
                            B_Date = b.DateB, 
                            Maxdate = dc.As.Select<A, DateTime>(tempA => tempA.DateA) 
                                    .Union(dc.Bs.Select<B, DateTime>(tempB => tempB.DateB)) 
                                    .Max() 
                        }; 
     

    This is the generated query:

    SELECT [t0].[Id] AS [A_Id], [t0].[DateA] AS [A_Date], [t1].[DateB] AS [B_Date], ( 
        SELECT MAX([t5].[DateA]) 
        FROM ( 
            SELECT [t4].[DateA] 
            FROM ( 
                SELECT [t2].[DateA] 
                FROM [dbo].[A] AS [t2] 
                UNION 
                SELECT [t3].[DateB] 
                FROM [dbo].[B] AS [t3] 
                ) AS [t4] 
            ) AS [t5] 
        ) AS [Maxdate] 
    FROM [dbo].[A] AS [t0] 
    INNER JOIN [dbo].[B] AS [t1] ON [t0].[Id] = [t1].[A_Id] 
    WHERE [t0].[Id] = @p0 
    -- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1] 

    This is obviously not the desired result.


    Tuesday, February 10, 2009 9:29 PM
  • What I tried so far - and how I failed so far:

    (...) 
        Maxdate = ((from a2 in a select a2.DateA) union (from b2 in b select b2.DateB)).Max() 
    (...) 


    This version doesn't compile, compiler error message:
    Could not find an implementation of the query pattern for source type '(...).A'.  'Select' not found.

    Another desperate attempt based on the idea to tell LINQ that it should (tell SQL Server) to build the desired ad hoc set:

    var query = 
                        from a in dc.As 
                        join b in dc.Bs on a.Id equals b.A_Id 
                        where a.Id == 1 
                        select new 
                        { 
                            A_Id = a.Id, 
                            A_Date = a.DateA, 
                            B_Date = b.DateB, 
                            Maxdate = (new List<DateTime>() { a.DateA, b.DateB }).Max() 
                        }; 


    That results in an Exception at runtime:
    System.InvalidOperationException: "Unrecognized expression node: ListInit"
    Any further ideas?

    Tuesday, February 10, 2009 9:59 PM
  • The query may look ugly, but it seems to produce the right results for me.  I tried just adding a row in A with an id of 1 and a datetime of 2009-01-01 and a row in B with an a_id of 1 and a datetime of GETDATE() and ran that query against the db, and it returns the date from B.  Adding a third or fourth table still chooses the greatest of the dates as well as a's id.
    Daniel - http://webs.neumont.edu/dstafford
    Wednesday, February 11, 2009 1:45 AM
  • First of all: Shame on me, because my SQL Query didn't really do what I wanted. I didn't paste the correct version. The correlations to the outer query in the union subselects were missing. So did they in Daniels proposals. Sorry for that!

    Here is one solution (LINQ to SQL and SQL) to that problem:

    Plain SQL

        select   
             a.id 
            ,b.a_id 
            ,c.a_id 
            ,d.a_id 
            ,a.dateA 
            ,b.dateB 
            ,c.dateC 
            ,d.dateD  
            ,(  
                select max(dateColsInDifferentTables.dateval) from   
                (  
                    Select a2.dateA as dateval from a a2 where a2.Id = a.Id 
                    union all  
                    Select b2.dateB as dateval from b b2 where b2.A_Id = a.Id 
                    union all  
                    Select c2.dateC as dateval from c c2 where c2.A_Id = a.Id 
                    union all  
                    Select d2.dateD as dateval from d d2 where d2.A_Id = a.Id 
                ) as dateColsInDifferentTables  
            )as maxdate 
        from   
            a  
            inner join b on a.id = b.a_id  
            inner join c on a.id = c.a_id  
            inner join d on a.id = d.a_id  
     

    LINQ

                var query = 
                        from a in dc.As 
                        join b in dc.Bs on a.Id equals b.A_Id 
                        join c in dc.Cs on a.Id equals c.A_Id 
                        join d in dc.Ds on a.Id equals d.A_Id 
                        select new 
                        { 
                            A_Id = a.Id, 
                            A_Date = a.DateA, 
                            B_Date = b.DateB, 
                            C_Date = c.DateC, 
                            D_Date = d.DateD, 
                            Maxdate = (from a2 in dc.As where a2.Id == a.Id select a2.DateA) 
                                        .Concat(from b2 in dc.Bs where b2.A_Id == a.Id select b2.DateB) 
                                        .Concat(from c2 in dc.Cs where c2.A_Id == a.Id select c2.DateC) 
                                        .Concat(from d2 in dc.Ds where d2.A_Id == a.Id select d2.DateD).Max() 
                        }; 
     

    LINQ's generated SQL

    SELECT [t0].[Id] AS [A_Id], [t0].[DateA] AS [A_Date], [t1].[DateB] AS [B_Date], [t2].[DateC] AS [C_Date], [t3].[DateD] AS [D_Date], ( 
        SELECT MAX([t11].[DateA]) 
        FROM ( 
            SELECT [t10].[DateA] 
            FROM ( 
                SELECT [t8].[DateA] 
                FROM ( 
                    SELECT [t6].[DateA] 
                    FROM ( 
                        SELECT [t4].[DateA] 
                        FROM [dbo].[A] AS [t4] 
                        WHERE [t4].[Id] = [t0].[Id] 
                        UNION ALL 
                        SELECT [t5].[DateB] 
                        FROM [dbo].[B] AS [t5] 
                        WHERE [t5].[A_Id] = [t0].[Id] 
                        ) AS [t6] 
                    UNION ALL 
                    SELECT [t7].[DateC] 
                    FROM [dbo].[C] AS [t7] 
                    WHERE [t7].[A_Id] = ([t0].[Id]) 
                    ) AS [t8] 
                UNION ALL 
                SELECT [t9].[DateD] 
                FROM [dbo].[D] AS [t9] 
                WHERE [t9].[A_Id] = [t0].[Id] 
                ) AS [t10] 
            ) AS [t11] 
        ) AS [Maxdate] 
    FROM [dbo].[A] AS [t0] 
    INNER JOIN [dbo].[B] AS [t1] ON [t0].[Id] = [t1].[A_Id] 
    INNER JOIN [dbo].[C] AS [t2] ON ([t0].[Id]) = [t2].[A_Id] 
    INNER JOIN [dbo].[D] AS [t3] ON [t0].[Id] = [t3].[A_Id] 

    At first glance the generated query looks ok to me and works well woth a small amount of data but I didn't check any exec plans so far...

    Thanks for your help Daniel!


    • Marked as answer by fcavelti Wednesday, February 11, 2009 5:51 PM
    Wednesday, February 11, 2009 5:50 PM
  • Just as a side note, I wouldn't use LINQ in this instance anyways, personally I would make a stored procedure and call that, it's much easier to control with a complex and possibly incredibly time consuming query such as this.
    Daniel - http://webs.neumont.edu/dstafford
    Wednesday, February 11, 2009 6:32 PM
  • I am not in the development stage where I decide what to put into stored procs yet. I'm working on an explorative prototype, particularly because I believe that there is still a lot of potential to improve that query.

    Look at this, this is what I initially was looking for as generated SQL:

    1    select   
    2         a.id 
    3        ,b.a_id 
    4        ,c.a_id 
    5        ,d.a_id 
    6        ,a.dateA 
    7        ,b.dateB 
    8        ,c.dateC 
    9        ,d.dateD  
    10        ,(  
    11            select max(dummy) from   
    12            (  
    13                Select dummy = a.DateA 
    14                union all  
    15                Select dummy = b.DateB 
    16                union all  
    17                Select dummy = c.DateC 
    18                union all  
    19                Select dummy = d.DateD 
    20            ) as dateColsInDifferentTables  
    21        )as maxdate 
    22    from   
    23        a  
    24        inner join b on a.id = b.a_id  
    25        inner join c on a.id = c.a_id  
    26        inner join d on a.id = d.a_id  
    27 


    I assume that this query performs a lot better because the calculation of maxdate is based on no additional joins / index- or tablescans. The only thing the Query Engine has to do is to build an ad hoc set per record (union all) and calc max of it. This is comparable to the usage of the CASE operator in the select list.

    So my actual question is: How can I tell LINQ to assemble the lines 13, 17, 15 and 19...?

    Wednesday, February 11, 2009 7:28 PM
  • var query = 
                        from a in dc.As 
                        join b in dc.Bs on a.Id equals b.A_Id 
                        join c in dc.Cs on a.Id equals c.A_Id 
                        join d in dc.Ds on a.Id equals d.A_Id 
                        select new 
                        { 
                            A_Id = a.Id, 
                            A_Date = a.DateA, 
                            B_Date = b.DateB, 
                            C_Date = c.DateC, 
                            D_Date = d.DateD, 
                            Maxdate = new List<DateTime>() 
                           { 
                               a.DateA, 
                               b.DateB, 
                               c.DateC, 
                               d.DateD 
                           }.Max() 
                        };  


    Daniel - http://webs.neumont.edu/dstafford
    Wednesday, February 11, 2009 8:12 PM
  • That was my first attempt too: Result: InvalidOperationException: Unrecognized expression node: ListInit. See my post above.
    Wednesday, February 11, 2009 8:51 PM
  • Next attempt:

                var query = 
                        from a in dc.As 
                        join b in dc.Bs on a.Id equals b.A_Id 
                        join c in dc.Cs on a.Id equals c.A_Id 
                        join d in dc.Ds on a.Id equals d.A_Id 
                        select new 
                        { 
                            A_Id = a.Id, 
                            A_Date = a.DateA, 
                            B_Date = b.DateB, 
                            C_Date = c.DateC, 
                            D_Date = d.DateD, 
     
                            Maxdate = (from a1 in dc.As select new { dummyField = a.DateA }).Distinct() 
                                         .Concat((from a2 in dc.As /*unfortunatly we have to select *from* somewhere - in SQL this is not necessary*/  
                                                    select new { dummyField = b.DateB }).Distinct()) 
                                         .Concat((from a3 in dc.As select new { dummyField = c.DateC }).Distinct()) 
                                         .Concat((from a4 in dc.As select new { dummyField = d.DateD }).Distinct()) 
                                         .Max() 
     
                        }; 
     

    Now I'm getting a very interesting NotSupportedException: Parameterless aggregate operator 'Max' is not supported over projections.
    I think LINQ is telling me, that what I want to express is not supported in LINQ (yet?). Because to use max over the select results (the projection) is really my intent.

    Is there any LINQ guru who can confirm this conclusion?


    Wednesday, February 11, 2009 9:07 PM
  • Sorry, I'm running these against Lists since this machine doesn't have SQL on it, but you could just select the different dates and calculate the maximum date outside of SQL, like this:
    var query = 
        from a in db.As 
        join b in db.Bs on a.Id equals b.AId 
        join c in db.Cs on a.Id equals c.AId 
        join d in db.Ds on a.Id equals d.AId 
        select new 
        { 
            AId = a.Id, 
            A_Date = a.DateA, 
            B_Date = b.DateB, 
            C_Date = c.DateC, 
            D_Date = d.DateD, 
        }; 
    foreach (var result in query) 
        DateTime maxDate = new List<DateTime>() 
        { 
            result.A_Date, 
            result.B_Date, 
            result.C_Date, 
            result.D_Date 
        }.Max(); 
    I realize it's not exactly what you wanted, but at least then you can still calculate it, and the query length and complexity will reduce greatly then too.

    Daniel - http://webs.neumont.edu/dstafford
    Thursday, February 12, 2009 12:49 AM
  • You are right I could do that but due to the following reasons in the non simplified case that is no real option:

    The query is a record provider for a paged AJAX table / grid control wrapped in a web service method, so for performance reasons I need exactly one statement to:
    • get the selection / projection (particularly this Maxdate)
    • sort the records by any of this attributes shown as columns in the control (order by)
    • page the results (with linq: skip() and take())
    So if LINQ to SQL is offering the power to express the above query or I will have switch to plain T-SQL (inline statement or stored proc).

    I'm quite certain the latter case is true.

    Thursday, February 12, 2009 6:44 AM
  • I would switch to a proc or a udf, I don't believe there is any other very viable option for you, especially with those requirements.
    Daniel - http://webs.neumont.edu/dstafford
    Friday, February 13, 2009 12:49 AM