none
Help in script-Latest record RRS feed

  • Question

  • Hi, I am trying to get one latest record of every person even EFFDATE are same. How can I get desired results. Help please.
    Select e_id, status,max(status_effdate) Effdate from Acc1 where e_id in ('14754','1991','29636')
    group by e_id, status
    order by e_id,Effdate
    --Output
    E_id       status     Effdate
    14754    A            2009-05-23 00:00:00.000
    14754    LB           2009-05-23 00:00:00.000
    1991      L             2008-06-17 00:00:00.000
    1991      S            2008-08-02 00:00:00.000
    1991      LB           2008-09-13 00:00:00.000
    29636     A           2009-08-21 00:00:00.000
    29636     L           2009-10-24 00:00:00.000

    --Desired Results
    E_id      status     Effdate
    14754     LB        2009-05-23 00:00:00.000
    1991       LB        2008-09-13 00:00:00.000
    29636      L         2009-10-24 00:00:00.000

    Thanks.

    Tuesday, October 20, 2009 10:59 AM

Answers

  • You can follow Adam's solution with ROW_NUMBER windowing function or do a SELECT on SELECT with a 2nd GROUP BY and the MAX (or  MIN) aggregate function to select the row you need:

    SELECT E_id, Status=Max(Status), Status_effdate
    FROM (SELECT   A.* 
          FROM     Acc1 A 
             INNER JOIN (SELECT   E_id, 
                                  Max(Status_effdate) Effdate 
                         FROM     Acc1 
                         WHERE    E_id IN ('14754','1991','29636') 
                         GROUP BY E_id) X 
               ON A.E_id = X.E_id 
                  AND A.Status_effdate = X.Effdate ) Y
    GROUP BY E_id, Status_effdate
    ORDER BY E_id 

    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    • Marked as answer by Khan_K Tuesday, October 20, 2009 7:34 PM
    Tuesday, October 20, 2009 2:30 PM
    Moderator

All replies

  • You can try INNER JOIN with GROUP BY:

    SELECT   A.* 
    FROM     Acc1 A 
             INNER JOIN (SELECT   E_id, 
                                  Max(Status_effdate) Effdate 
                         FROM     Acc1 
                         WHERE    E_id IN ('14754','1991','29636') 
                         GROUP BY E_id) X 
               ON A.E_id = X.E_id 
                  AND A.Status_effdate = X.Effdate 
    ORDER BY A.E_id 

    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Tuesday, October 20, 2009 11:19 AM
    Moderator
  • This is made very easy with the new windowing functions, especially if you have multiple criteria for the selection:

    WITH cte AS
    (
      SELECT *
            , row_num = ROW_NUMBER() OVER (PARTITION BY E_id
                                           ORDER BY     status_effdate DESC)
      FROM    Acc1
      WHERE   E_id IN ('14754','1991','29636')
    )
    SELECT    E_id
            , status
            , status_effdate
    FROM      cte
    WHERE     row_num = 1



    Adam Tappis. (MCSD VB6 & .NET, MCDBA SQL 2000)
    • Proposed as answer by K H Tan Tuesday, October 20, 2009 12:05 PM
    Tuesday, October 20, 2009 11:37 AM
    Moderator
  • I used the following query but still the problem of same EFFDATE, showing two record of e_id-14754.
    I need the last one record after sorting if EFFDATE are same of any record.
    The e_id has same effdate thats why getting two record of e_id-14754.

    I am using SQL-Server 2000.

    SELECT   A.*
    FROM     Acc1 A
             INNER JOIN (SELECT   E_id,
                                  Max(Status_effdate) Effdate
                         FROM     Acc1
                         WHERE    E_id IN ('14754','1991','29636')
                         GROUP BY E_id) X
               ON A.E_id = X.E_id
                  AND A.Status_effdate = X.Effdate
    ORDER BY A.E_id, A.EFFDATE

    --Output from the above query
    E_id       status     Effdate
    14754    A            2009-05-23 00:00:00.000
    14754    LB           2009-05-23 00:00:00.000
    1991      LB           2008-09-13 00:00:00.000
    29636     L           2009-10-24 00:00:00.000

    --Desired output
    E_id       status     Effdate
    14754    LB           2009-05-23 00:00:00.000
    1991      LB           2008-09-13 00:00:00.000
    29636     L           2009-10-24 00:00:00.000

    Tuesday, October 20, 2009 1:01 PM
  • When the sort criteria is not unique, then there is no "last" row. Sets are by their definition order-less. If you were to rank the rows on the same criteria they would have the same rank i.e. be tied. Although my solution using ROW_NUMBER() will give you what you want, the actual row returned in case of ties is actually quite random. What you need to do is add additional columns to your ORDER BY clause to deal with ties e.g. if the effdate is the same then decide based on status as a secondary sort column.

    WITH cte AS
    (
      SELECT *
            , row_num = ROW_NUMBER() OVER (PARTITION BY E_id
                                           ORDER BY     status_effdate DESC
                                                             , status ASC)
      FROM    Acc1
      WHERE   E_id IN ('14754','1991','29636')
    )
    SELECT    E_id
            , status
            , status_effdate
    FROM      cte
    WHERE     row_num = 1


    If you have some other identity column that identifies the order in which the rows were inserted into the table then you could use that as your secondary order criteria and this is where the ROW_NUMBER() solution is a lot more flexible than the sub-query with GROUP BY.


    Adam Tappis. (MCSD VB6 & .NET, MCDBA SQL 2000)
    Tuesday, October 20, 2009 1:25 PM
    Moderator
  • You can follow Adam's solution with ROW_NUMBER windowing function or do a SELECT on SELECT with a 2nd GROUP BY and the MAX (or  MIN) aggregate function to select the row you need:

    SELECT E_id, Status=Max(Status), Status_effdate
    FROM (SELECT   A.* 
          FROM     Acc1 A 
             INNER JOIN (SELECT   E_id, 
                                  Max(Status_effdate) Effdate 
                         FROM     Acc1 
                         WHERE    E_id IN ('14754','1991','29636') 
                         GROUP BY E_id) X 
               ON A.E_id = X.E_id 
                  AND A.Status_effdate = X.Effdate ) Y
    GROUP BY E_id, Status_effdate
    ORDER BY E_id 

    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    • Marked as answer by Khan_K Tuesday, October 20, 2009 7:34 PM
    Tuesday, October 20, 2009 2:30 PM
    Moderator
  • Thanks.
    Tuesday, October 20, 2009 7:34 PM
  • See alternative solutions (especially the last one and the ID = (select top 1..)) from http://blogs.lessthandot.com/index.php/DataMgmt/DBProgramming/MSSQLServer/including-an-aggregated-column-s-related
    Premature optimization is the root of all evil in programming.

    Donald Knuth, repeating C. A. R. Hoare

    My blog
  • Using subquery you eliminate the need of double derived tables solution, e.g.

    SELECT   A.*
    FROM     Acc1 A where A.IDField = (select top 1 Acc1.IDField from Acc1 where E_id IN ('14754','1991','29636') and A.E_ID = Acc1.E_ID order by Acc1.Status_EffDate DESC)
    ORDER BY A.E_id

    Premature optimization is the root of all evil in programming.

    Donald Knuth, repeating C. A. R. Hoare

    My blog
  • Listen, I don't mean to be precious about my solution but I feel the need to make one final comment.

    Using subquery you eliminate the need of double derived tables solution
    Naom, this not necessarily a good thing. Your solution will execute the subquery for every row in the outer query and is more akin to a loop than a set based operation and hence gives the worst performsnce.

    I suggest you take the above 3 queries and execute them in a batch to compare their performance. I got the follwoing results on SQL Express 2005

    Query 1 - SQLUSA - double sub query with GROUP BY - 30% of batch
    Query 2 - Adam - ROW_NUMBER() based solution - 12% of batch
    Query 3 - Naom - sub-select in where clause - 58% of batch

    so on the small sample of data provided mine is by far the most performant. I would encourage you to test this on your own data. Here's the script for your convenience:
     
    --drop table #acct
    --
    --CREATE TABLE #acct
    --( IDField INT IDENTITY(1,1) NOT NULL PRIMARY KEY
    --, E_id INT NOT NULL
    --, status VARCHAR(2) NOT NULL
    --, Effdate DATETIME NOT NULL
    --)
    --
    --INSERT #acct
    --SELECT 14754, 'A', '2009-05-23 00:00:00.000' UNION ALL
    --SELECT 14754, 'LB', '2009-05-23 00:00:00.000' UNION ALL
    --SELECT 1991, 'L', '2008-06-17 00:00:00.000' UNION ALL
    --SELECT 1991, 'S', '2008-08-02 00:00:00.000' UNION ALL
    --SELECT 1991, 'LB', '2008-09-13 00:00:00.000' UNION ALL
    --SELECT 29636, 'A', '2009-08-21 00:00:00.000' UNION ALL
    --SELECT 29636, 'L', '2009-10-24 00:00:00.000' UNION ALL
    --SELECT 12345, 'L', '2009-10-24 00:00:00.000'
    
    SELECT E_id, Status=Max(Status), effdate
    FROM (SELECT   A.* 
          FROM     #Acct A 
             INNER JOIN (SELECT   E_id, 
                                  Max(effdate) Effdate 
                         FROM     #Acct 
                         WHERE    E_id IN (14754,1991,29636) 
                         GROUP BY E_id) X 
               ON A.E_id = X.E_id 
                  AND A.effdate = X.Effdate ) Y
    GROUP BY E_id, effdate
    ORDER BY E_id; 
    
    WITH cte AS
    (
      SELECT *
            , row_num = ROW_NUMBER() OVER (PARTITION BY E_id
                                           ORDER BY     effdate DESC)
      FROM    #Accta
      WHERE   E_id IN (14754,1991,29636)
    )
    SELECT    E_id
            , status
            , effdate
    FROM      cte
    WHERE     row_num = 1;
    
    SELECT   A.E_id
    , A.status
    , A.effdate
    FROM     #Acct A where A.IDField = (select top 1 IDField from #Acct where E_id IN (14754,1991,29636) and A.E_ID = E_ID order by EffDate DESC)
    ORDER BY A.E_id 




    Adam Tappis. (MCSD VB6 & .NET, MCDBA SQL 2000)
    Thursday, October 22, 2009 12:07 AM
    Moderator
  • Yes, I got the similar results in SQL Server 2008 Express.

    I have 30/12/58% of the batch my solution being the slowest for this case. Not sure why in my tests in August for a different problem I got completely opposite results.

    ----------------------------------------
    CREATE TABLE Items(
    	ItemId int NOT NULL,
    	ItemName varchar(20) NOT NULL,
    	ItemDesciption varchar(20) NULL,
    	Supplier varchar(20) NULL,
    	ItemValue numeric(10, 2) NULL,
     CONSTRAINT PK_Items      PRIMARY KEY CLUSTERED (ItemId ASC),
     CONSTRAINT UQ_Items_Name UNIQUE   NONCLUSTERED (ItemName ASC)
    )
    
    GO
    
    CREATE TABLE Bid(
    	BidId int IDENTITY(1,1) NOT NULL,
    	ItemId int NOT NULL,
    	BidAmount decimal(10, 2) NOT NULL,
    	BidDateTime datetime NOT NULL,
     CONSTRAINT PK_Bid PRIMARY KEY CLUSTERED (BidId ASC) 
    )
    
    GO
    
    CREATE NONCLUSTERED INDEX IX_Bid 
        ON dbo.Bid  (ItemId ASC, BidAmount DESC)
    
    ALTER TABLE dbo.Bid  
      ADD CONSTRAINT FK_Bid_Items1 
      FOREIGN KEY(ItemId)
      REFERENCES dbo.Items (ItemId)
    GO
    
    ALTER TABLE dbo.Bid CHECK CONSTRAINT FK_Bid_Items1
    GO
    
    SET NOCOUNT ON
    Declare @i int
    Declare @j int
    Declare @r int
    
    Set @i=2000
    While @i<=100000 Begin
       Insert Into Items( ItemId, ItemName, ItemDesciption, ItemValue )
       Select @i,
              'Item' + Right('0000'+ Cast(@i as varchar(5)),5),
              'Desc' + Right('0000'+ Cast(@i as varchar(5)),5),
              cast(Rand()*1000. as numeric(10,2))
       Select @r = Rand()*10, @j=1
       While @j<=@r Begin
           Insert Into Bid(ItemId, BidAmount, BidDateTime)
           Select @i, 
                  cast(Rand()*1000. as numeric(10,2)),
                  GetDate()
          Set @j=@j+1
       end
                  
       Set @i=@i+1
    end
    GO
    
    set nocount on
    declare @ItemsCount int, @BidsCount int
    
    select @ItemsCount = COUNT(*) from Items
    select @BidsCount = COUNT(*) from Bid
    
    print 'Test case - Number of Items - ' + cast(@ItemsCount as varchar(10)) + ' Bids count - ' + cast(@BidsCount as varchar(10))
    
    
    print replicate('-',50) + char(13) + 'Nikola''s solution - TOP clause ' 
    set statistics time on
    SELECT Bid.BidId, 
           Bid.ItemId, 
           Bid.BidAmount, 
           Bid.BidDateTime, 
           Items.ItemId AS Expr1, 
           Items.ItemName, 
           Items.ItemDesciption, 
           Items.Supplier,
           Items.ItemValue
      From Items
      Left Join Bid 
             on Bid.ItemId = Items.ItemId 
            and Bid.BidId in (Select Top 1 BidId
                                From Bid b
                               Where b.ItemId=Items.ItemId
                               Order By b.BidAmount desc) order by Items.ItemId
    
    
    set statistics time off
    print replicate('-',50) + char(13) + 'Derived table solution - can have duplicates '
    set statistics time on
    SELECT X.BidId, 
           X.ItemId, 
           X.BidAmount, 
           X.BidDateTime, 
           Items.ItemId AS Expr1, 
           Items.ItemName, 
           Items.ItemDesciption, 
           Items.Supplier,
           Items.ItemValue
      From Items
      Left Join (select Bid.* from Bid inner join ( 
             Select ItemId, MAX(BidAmount) as BidAmount from Bid group by ItemId ) B on
                               b.ItemId=Bid.ItemId and Bid.BidAmount = B.BidAmount) X on Items.ItemId = X.ItemId order by Items.ItemId
    
    
    set statistics time off                           
    print replicate('-',50) + char(13) + 'Correlated subquery solution ' 
    set statistics time on
    SELECT Bid.BidId, 
           Bid.ItemId, 
           Bid.BidAmount, 
           Bid.BidDateTime, 
           Items.ItemId AS Expr1, 
           Items.ItemName, 
           Items.ItemDesciption, 
           Items.Supplier,
           Items.ItemValue
      From Items
      Left Join Bid on Bid.ItemId = Items.ItemId where Bid.ItemId IS NULL OR 
      Bid.BidAmount = (select MAX(BidAmount) from Bid b where b.ItemId = Items.ItemId) order by Items.ItemId  
      
    set statistics time off
    print replicate('-',50) + char(13) + 'ROW_NUMBER() solution '
    set statistics time on
    select * from (SELECT Bid.BidId, 
           Bid.ItemId, 
           Bid.BidAmount, 
           Bid.BidDateTime, 
           Items.ItemId AS Expr1, 
           Items.ItemName, 
           Items.ItemDesciption, 
           Items.Supplier,
           Items.ItemValue, ROW_NUMBER() over (PARTITION by Items.ItemID order by Bid.BidAmount Desc) as rn 
      From Items
      Left Join Bid on Bid.ItemId = Items.ItemId) X where rn = 1 order by X.ItemId                          
    
    set statistics time off
    print replicate('-',50) + char(13) + 'Row_number() - slight variation '
    set statistics time on
    SELECT top 1 with ties Bid.BidId, 
           Bid.ItemId, 
           Bid.BidAmount, 
           Bid.BidDateTime, 
           Items.ItemId AS Expr1, 
           Items.ItemName, 
           Items.ItemDesciption, 
           Items.Supplier,
           Items.ItemValue  
      From Items
      Left Join Bid on Bid.ItemId = Items.ItemId 
      order by ROW_NUMBER() over (PARTITION by Items.ItemID order by Bid.BidAmount Desc) 
    
    set statistics time off
      print replicate('-',50) + char(13) + 'Finally - compound key solution ' 
      set statistics time on
      select Bid.ItemId, 
           Bid.BidAmount, 
           cast(right(Bid.CompKey,30) as datetime) as BidDateTime, 
           Items.ItemId AS Expr1, 
           Items.ItemName, 
           Items.ItemDesciption, 
           Items.Supplier,
           Items.ItemValue  
      From Items
      Left Join (select ItemID, Max(BidAmount) as BidAmount, 
      MAX(cast(BidAmount as char(10)) + convert(char(30),BidDateTime,126)) as CompKey from Bid group by ItemId) Bid
      on Items.ItemId = Bid.ItemId order by Items.ItemId
      
      set statistics time off
      

    Premature optimization is the root of all evil in programming.

    Donald Knuth, repeating C. A. R. Hoare

    My blog
  • Obviously there are many factors that could impact the performance of these queries:
    - The table structure
    - the data types
    - The indexes
    - the statistics
    - the data volumes

    all aspects which will influence the query plan chosen.

    In addition, your tests include 2 tables that are JOINed which further complicates the situation.

    Hence I suggest the OP test all 3 solutions side by side on his/her data to see which works best in their case.


    Adam Tappis. (MCSD VB6 & .NET, MCDBA SQL 2000)
    Thursday, October 22, 2009 2:08 AM
    Moderator
  • No doubt about it.

    I ran my tests multiple times and got varying results... I also tried to eliminate display factor and make it a clean test using tips from Brad from another message, however, if I discard results, then I don't get messages either - so I'm not sure how to capture times.

    Anyway, now my tests are not conclusive - they show different results every time I'm running them.

    Premature optimization is the root of all evil in programming.

    Donald Knuth, repeating C. A. R. Hoare

    My blog
  • It continued to bother me why the OP would select a seemingly inferior solution as the answer to his question and the only thing I could think of was:
    - the OP's experience and understanding of T-SQL
    - The version of SQL Server he's using

    Looking through some of his other threads confirms that he's using SQL 2000. Hence of the supported solutions presented (SQLUSA and Naom) SQLUSA's is indeed the best solution for SQL 2000.

    Sultan_K, in future could you please specify the platform you are using in your original question to avoid any confusion for the answerers and other forum users?

    Thanks,
    Adam.
    Adam Tappis. (MCSD VB6 & .NET, MCDBA SQL 2000)
    Tuesday, October 27, 2009 6:57 PM
    Moderator