locked
LAG & LEAD functions... Any Way to Retrieve the 1st non-NULL Values? RRS feed

  • Question

  • My question is this... Has anyone found an elegant way of getting the LAG & LEAD functions to move to the 1st NON-NULL value within the partition, rather than simply using a hard-coded offset value?

    Here's some test data...

    IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp
    CREATE TABLE #temp (
    	BranchID INT NOT NULL,
    	RandomValue INT NULL,
    	TransactionDate DATETIME
    	PRIMARY KEY (BranchID, TransactionDate)
    	)
    INSERT #temp (BranchID,RandomValue,TransactionDate) VALUES
    (339,6, '20060111 00:55:55'),
    (339,NULL, '20070926 23:32:00'),
    (339,NULL, '20101222 10:51:35'),
    (339,NULL, '20101222 10:51:37'),
    (339,1, '20101222 10:52:00'),
    (339,1, '20120816 12:02:00'),
    (339,1, '20121010 10:36:00'),
    (339,NULL, '20121023 10:47:53'),
    (339,NULL, '20121023 10:48:08'),
    (339,1, '20121023 10:49:00'),
    (350,1, '20060111 00:55:55'),
    (350,NULL, '20070926 23:31:06'),
    (350,NULL, '20080401 16:34:54'),
    (350,NULL, '20080528 15:06:39'),
    (350,NULL, '20100419 11:05:49'),
    (350,NULL, '20120315 08:51:00'),
    (350,NULL, '20120720 11:48:35'),
    (350,1, '20120720 14:48:00'),
    (350,NULL, '20121207 08:10:14')

    What I'm trying to accomplish... In this instance, I'm trying to populate the NULL values with the 1st non-null preceding value. 

    The LAG function works well when there's only a single null value in a sequence but doesn't do the job if there's more than a singe NULL in the sequence.

    For example ...

    SELECT
    	t.BranchID,
    	t.RandomValue,
    	t.TransactionDate,
    	COALESCE(t.RandomValue, LAG(t.RandomValue, 1) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate)) AS LagValue
    FROM
    	#temp t

    Please note that I am aware of several methods of accomplishing this particular task, including self joins, CTEs and smearing with variables.

    So, I'm not looking for alternative way of accomplishing the task... I'm wanting to know if it's possible to do this with the LAG function.

    Thanks in advance,

    Jason


    Jason Long


    • Edited by Jason A Long Wednesday, February 25, 2015 4:22 PM
    Wednesday, February 25, 2015 4:20 PM

Answers

  • IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp
    CREATE TABLE #temp (
    	BranchID INT NOT NULL,
    	RandomValue INT NULL,
    	TransactionDate DATETIME
    	PRIMARY KEY (BranchID, TransactionDate)
    	)
    INSERT #temp (BranchID,RandomValue,TransactionDate) VALUES
    (339,6, '20060111 00:55:55'),
    (339,NULL, '20070926 23:32:00'),
    (339,NULL, '20101222 10:51:35'),
    (339,NULL, '20101222 10:51:37'),
    (339,1, '20101222 10:52:00'),
    (339,1, '20120816 12:02:00'),
    (339,1, '20121010 10:36:00'),
    (339,NULL, '20121023 10:47:53'),
    (339,NULL, '20121023 10:48:08'),
    (339,1, '20121023 10:49:00'),
    (350,1, '20060111 00:55:55'),
    (350,NULL, '20070926 23:31:06'),
    (350,NULL, '20080401 16:34:54'),
    (350,NULL, '20080528 15:06:39'),
    (350,NULL, '20100419 11:05:49'),
    (350,NULL, '20120315 08:51:00'),
    (350,NULL, '20120720 11:48:35'),
    (350,1, '20120720 14:48:00'),
    (350,NULL, '20121207 08:10:14')
    
    ;with mycte as (
    SELECT BranchID, RandomValue,TransactionDate,
      CAST(SUBSTRING(MAX( CAST(BranchID AS BINARY(4)) + CAST(RandomValue AS BINARY(4)) )
       OVER( ORDER BY TransactionDate ROWS UNBOUNDED PRECEDING ), 5, 4) AS int) AS lastNonNullval
    FROM #temp)
    
    Select BranchID, RandomValue,TransactionDate,
     COALESCE(RandomValue,LAG(lastNonNullval, 1) OVER (PARTITION BY BranchID ORDER BY TransactionDate)) AS LagValue 
     
    FROM  mycte
     
     
    drop table #temp
    
    --Refer to: Itzik Ben-Gan
    --http://sqlmag.com/t-sql/last-non-null-puzzle

    • Marked as answer by Jason A Long Thursday, February 26, 2015 5:20 AM
    Wednesday, February 25, 2015 10:04 PM
  • Yes, but it involves a subquery, and might not meet your criteria on that front. Basically, you create a sub query to find the number of rows that the last non-null value was:

    SELECT t.BranchID, t.RandomValue, t.TransactionDate,  COALESCE(LAG(t.RandomValue, 
    NULLIF((SELECT COUNT(*) FROM @temp WHERE transactionDate BETWEEN (SELECT MAX(transactionDate) FROM @temp WHERE randomValue IS NOT NULL AND transactionDate < t.TransactionDate AND branchID = t.branchID) AND t.TransactionDate AND branchID = t.branchID),0)-1
    ) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate),randomValue) AS LagValue
      FROM @temp t

    and pass that to the LAG function.
    • Edited by Patrick Hurst Wednesday, February 25, 2015 5:01 PM
    • Marked as answer by Jason A Long Wednesday, February 25, 2015 10:02 PM
    Wednesday, February 25, 2015 5:01 PM
  • I've seen and worked with this question before.  And I don't know of any way to do this in the general case without some sort of join back to the original table. I'm pretty sure such a solution does not exist.

    If there is some value N where you know that there will never be more the N Null values in a row, then there is a solution with only one pass through the table.  For example if N = 10, then

    SELECT t.BranchID, t.RandomValue, t.TransactionDate,
       COALESCE(t.RandomValue, 
       LAG(t.RandomValue, 1) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 2) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 3) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 4) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 5) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate),
       LAG(t.RandomValue, 6) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 7) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 8) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 9) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 10) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate)) As LagValue
    From #temp t 
    Tom
    • Marked as answer by Jason A Long Wednesday, February 25, 2015 10:02 PM
    Wednesday, February 25, 2015 7:00 PM
  • Yes, this is possible to do with a set-based solution with a single pass over the data.

    But, no, it is not trivial, but it takes a true T-SQL master like SQL Server MVP Itzik Ben-Gan to come up with it. I still have not been able to understand it myself. :-)

    See this article from Itzik to learn about the solution.
    http://sqlmag.com/t-sql/last-non-null-puzzle


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by Jason A Long Thursday, February 26, 2015 5:05 AM
    Wednesday, February 25, 2015 10:59 PM

All replies

  • Yes, but it involves a subquery, and might not meet your criteria on that front. Basically, you create a sub query to find the number of rows that the last non-null value was:

    SELECT t.BranchID, t.RandomValue, t.TransactionDate,  COALESCE(LAG(t.RandomValue, 
    NULLIF((SELECT COUNT(*) FROM @temp WHERE transactionDate BETWEEN (SELECT MAX(transactionDate) FROM @temp WHERE randomValue IS NOT NULL AND transactionDate < t.TransactionDate AND branchID = t.branchID) AND t.TransactionDate AND branchID = t.branchID),0)-1
    ) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate),randomValue) AS LagValue
      FROM @temp t

    and pass that to the LAG function.
    • Edited by Patrick Hurst Wednesday, February 25, 2015 5:01 PM
    • Marked as answer by Jason A Long Wednesday, February 25, 2015 10:02 PM
    Wednesday, February 25, 2015 5:01 PM
  • Patrick,

    Thanks for the reply. Your solution, using the correlated sub-query in the select list. does in fact work but the inefficiency defeats the the purpose (single pass at the table) of using a windowed function.

    I was just hoping there was a syntax similar to ROWS or RANGE that I just wasn't aware of, that could be used in place of the offset value.

    Points for coming up with an innovative solution though. :)

    Thanks,

    Jason


    Jason Long

    Wednesday, February 25, 2015 6:27 PM
  • I've seen and worked with this question before.  And I don't know of any way to do this in the general case without some sort of join back to the original table. I'm pretty sure such a solution does not exist.

    If there is some value N where you know that there will never be more the N Null values in a row, then there is a solution with only one pass through the table.  For example if N = 10, then

    SELECT t.BranchID, t.RandomValue, t.TransactionDate,
       COALESCE(t.RandomValue, 
       LAG(t.RandomValue, 1) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 2) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 3) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 4) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 5) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate),
       LAG(t.RandomValue, 6) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 7) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 8) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 9) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate), 
       LAG(t.RandomValue, 10) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate)) As LagValue
    From #temp t 
    Tom
    • Marked as answer by Jason A Long Wednesday, February 25, 2015 10:02 PM
    Wednesday, February 25, 2015 7:00 PM
  • Tom,

    Thanks for the reply. Definitely a usable solution. I ended up just doing a data smear using variables.

    The primary purpose for the question was to simply make sure I wasn't missing an available syntax option. I'm still not 100% when it comes to some of the newer (yea I know 2012 isn't new anymore) window functions. I think I have the answer, even if it wasn't the one I wanted...

    I also found where Celko had a fairly clean method for dealing with the problem using the LAG function with a CTE. See the Filling in Gaps section of the following article.

    https://www.simple-talk.com/sql/t-sql-programming/window-functions-in-sql/


    Jason Long


    • Edited by Jason A Long Wednesday, February 25, 2015 7:40 PM
    Wednesday, February 25, 2015 7:39 PM
  • IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp
    CREATE TABLE #temp (
    	BranchID INT NOT NULL,
    	RandomValue INT NULL,
    	TransactionDate DATETIME
    	PRIMARY KEY (BranchID, TransactionDate)
    	)
    INSERT #temp (BranchID,RandomValue,TransactionDate) VALUES
    (339,6, '20060111 00:55:55'),
    (339,NULL, '20070926 23:32:00'),
    (339,NULL, '20101222 10:51:35'),
    (339,NULL, '20101222 10:51:37'),
    (339,1, '20101222 10:52:00'),
    (339,1, '20120816 12:02:00'),
    (339,1, '20121010 10:36:00'),
    (339,NULL, '20121023 10:47:53'),
    (339,NULL, '20121023 10:48:08'),
    (339,1, '20121023 10:49:00'),
    (350,1, '20060111 00:55:55'),
    (350,NULL, '20070926 23:31:06'),
    (350,NULL, '20080401 16:34:54'),
    (350,NULL, '20080528 15:06:39'),
    (350,NULL, '20100419 11:05:49'),
    (350,NULL, '20120315 08:51:00'),
    (350,NULL, '20120720 11:48:35'),
    (350,1, '20120720 14:48:00'),
    (350,NULL, '20121207 08:10:14')
    
    ;with mycte as (
    SELECT BranchID, RandomValue,TransactionDate,
      CAST(SUBSTRING(MAX( CAST(BranchID AS BINARY(4)) + CAST(RandomValue AS BINARY(4)) )
       OVER( ORDER BY TransactionDate ROWS UNBOUNDED PRECEDING ), 5, 4) AS int) AS lastNonNullval
    FROM #temp)
    
    Select BranchID, RandomValue,TransactionDate,
     COALESCE(RandomValue,LAG(lastNonNullval, 1) OVER (PARTITION BY BranchID ORDER BY TransactionDate)) AS LagValue 
     
    FROM  mycte
     
     
    drop table #temp
    
    --Refer to: Itzik Ben-Gan
    --http://sqlmag.com/t-sql/last-non-null-puzzle

    • Marked as answer by Jason A Long Thursday, February 26, 2015 5:20 AM
    Wednesday, February 25, 2015 10:04 PM
  • Yes, this is possible to do with a set-based solution with a single pass over the data.

    But, no, it is not trivial, but it takes a true T-SQL master like SQL Server MVP Itzik Ben-Gan to come up with it. I still have not been able to understand it myself. :-)

    See this article from Itzik to learn about the solution.
    http://sqlmag.com/t-sql/last-non-null-puzzle


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by Jason A Long Thursday, February 26, 2015 5:05 AM
    Wednesday, February 25, 2015 10:59 PM
  • Thanks, Jingyang and Erland, for pointing this solution out.  And, of course, many thanks to Itzik for developing it.

    Tom

    Thursday, February 26, 2015 4:08 AM
  • Now this is exactly what I was looking for! Thanks Jingyang Li for finding and posting this solution. 

    I'll definitely spend some time and see if I can wrap my head around how this is working.

    Erland - I could agree more with your assessment of Itzik Ben-Gan. Hands down my favorite SQL author.

    That said, I've swiped more than few of your solutions as well. "OPTION(RECOMPILE)" is the 1st thing comes to mind...

    Huge thank you to everyone who added to this thread. I'm a happy man. :)

     


    Jason Long

    Thursday, February 26, 2015 5:20 AM
  • I just wanted to provide a little follow-up now that I had a little time to check up and digest Itzik’s article and tested the code posed by Jingyang.

    Turns out the code posted by Jingyang didn’t actually produce the desired results but it did get me pointed in the right direction (partially my fault for crappy test data that didn’t lend itself to easy verification). That said, I did want to post the version of the code that does produce the correct results.

    IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp
    CREATE TABLE #temp (
    	BranchID INT NOT NULL,
    	RandomValue INT NULL,
    	TransactionDate DATETIME
    	PRIMARY KEY (BranchID, TransactionDate)
    	)
    INSERT #temp (BranchID,RandomValue,TransactionDate) VALUES
    (339,6, '20060111 00:55:55'), (339,NULL, '20070926 23:32:00'), (339,NULL, '20101222 10:51:35'), (339,5, '20101222 10:51:37'),
    (339,2, '20101222 10:52:00'), (339,2, '20120816 12:02:00'), (339,2, '20121010 10:36:00'), (339,NULL, '20121023 10:47:53'),
    (339,NULL, '20121023 10:48:08'), (339,1, '20121023 10:49:00'), (350,3, '20060111 00:55:55'), (350,NULL, '20070926 23:31:06'),
    (350,NULL, '20080401 16:34:54'), (350,NULL, '20080528 15:06:39'), (350,NULL, '20100419 11:05:49'), (350,NULL, '20120315 08:51:00'),
    (350,NULL, '20120720 11:48:35'), (350,4, '20120720 14:48:00'), (350,2, '20121207 08:10:14')
    
    SELECT
    	t.BranchID,
    	t.RandomValue,
    	t.TransactionDate,
    	COALESCE(t.RandomValue,
    		CAST(
    			SUBSTRING(
    				MAX(CAST(t.TransactionDate AS BINARY(4)) + CAST(t.RandomValue AS BINARY(4))) OVER (PARTITION BY t.BranchID ORDER BY t.TransactionDate ROWS UNBOUNDED PRECEDING)
    			,5,4) 
    		AS INT)
    	) AS RandomValueNew
    FROM
    	#temp AS t

    In reality, this isn’t exactly a true answer to the original question regarding the LAG & LEAD functions, being that it uses the MAX function instead, but who cares? It still uses a windowed function to solve the problem with a single pass at the data.

    I also did a little additional testing to see if casting to BINARY(4) worked across the board with a variety of data types or if the number needed to be adjusted based the data… Here’s one of my test scripts…

    IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp
    CREATE TABLE #Temp (
    	ID INT,
    	Num BIGINT,
    	String VARCHAR(25),
    	[Date] DATETIME,
    	Series INT 
    	)
    INSERT #temp (ID,Num,String,Date,Series) VALUES 
    (1, 2, 'X', '19000101', 1), ( 2, 3, 'XX', '19000108', 1),
    (3, 4, 'XXX', '19000115', 1), ( 4, 6, 'XXXX', '19000122', 1),
    (5, 9, 'XXXXX', '19000129', 1), ( 6, 13, 'XXXXXX', '19000205', 2),
    (7, NULL, 'XXXXXXX', '19000212', 2),
    (8, NULL, 'XXXXXXXX', '19000219', 2),
    (9, NULL, 'XXXXXXXXX', '19000226', 2),
    (10, NULL, 'XXXXXXXXXX', '19000305', 2),
    (11, NULL, NULL, '19000312', 3), ( 12, 141, NULL, '19000319', 3),
    (13, 211, NULL, '19000326', 3), ( 14, 316, NULL, '19000402', 3),
    (15, 474, 'XXXXXXXXXXXXXXX', '19000409', 3),
    (16, 711, 'XXXXXXXXXXXXXXXX', '19000416', 4),
    (17, NULL, NULL, '19000423', 4), ( 18, NULL, NULL, '19000430', 4),
    (19, NULL, 'XXXXXXXXXXXXXXXXXXXX', '19000507', 4), ( 20, NULL, NULL, '19000514', 4),
    (21, 5395, NULL, '19000521', 5),
    (22, NULL, NULL, '19000528', 5),
    (23, 12138, 'XXXXXXXXXXXXXXXXXXXXXXX', '19000604', 5),
    (24, 2147483647, 'XXXXXXXXXXXXXXXXXXXXXXXX', '19000611', 5),
    (25, NULL, 'XXXXXXXXXXXXXXXXXXXXXXXXX', '19000618', 5),
    (26, 27310, 'XXXXXXXXXXXXXXXXXXXXXXXXX', '19000618', 6),
    (27, 9223372036854775807, 'XXXXXXXXXXXXXXXXXXXXXXXXX', '19000618', 6),
    (28, NULL, NULL, '19000618', 6),
    (29, NULL, 'XXXXXXXXXXXXXXXXXXXXXXXXX', '19000618', 6),
    (30, 27310, NULL, '19000618', 6)
    
    SELECT 
    	ID,
    	Num,
    	String,
    	[Date],
    	Series,
    	CAST(SUBSTRING(MAX(CAST(t.[Date] AS BINARY(4)) + CAST(t.Num AS BINARY(4))) OVER (ORDER BY t.[Date] ROWS UNBOUNDED PRECEDING), 5,4) AS BIGINT) AS NumFill,
    	CAST(SUBSTRING(MAX(CAST(t.[Date] AS BINARY(4)) + CAST(t.Num AS BINARY(4))) OVER (PARTITION BY t.Series ORDER BY t.[Date] ROWS UNBOUNDED PRECEDING), 5,4) AS BIGINT) AS NumFillWithPartition,
    	CAST(SUBSTRING(MAX(CAST(t.[Date] AS BINARY(4)) + CAST(t.Num AS BINARY(8))) OVER (ORDER BY t.[Date] ROWS UNBOUNDED PRECEDING), 5,8) AS BIGINT) AS BigNumFill,
    	CAST(SUBSTRING(MAX(CAST(t.[Date] AS BINARY(4)) + CAST(t.Num AS BINARY(8))) OVER (PARTITION BY t.Series ORDER BY t.[Date] ROWS UNBOUNDED PRECEDING), 5,8) AS BIGINT) AS BIGNumFillWithPartition,
    	CAST(SUBSTRING(MAX(CAST(t.ID AS BINARY(4)) + CAST(t.String AS BINARY(255))) OVER (ORDER BY t.ID ROWS UNBOUNDED PRECEDING), 5,255) AS VARCHAR(25)) AS StringFill,
    	CAST(SUBSTRING(MAX(CAST(t.ID AS BINARY(4)) + CAST(t.String AS BINARY(25))) OVER (PARTITION BY t.Series ORDER BY t.ID ROWS UNBOUNDED PRECEDING), 5,25) AS VARCHAR(25)) AS StringFillWithPartition
    FROM #Temp AS t

    Looks like BINARY(4) is just fine for any INT or DATE/DATETIME values. Bumping it up to 8 was need to capture the largest BIGINT value. For text strings, the number simply needs to be set to the column size. I tested up to 255 characters without a problem.

    It’s not included here, but I did notice that the NUMERIC data type doesn’t work at all. From what I can tell, SS doesn't like casting the binary value back to NUMERIC (I didn't test DECIMAL).

    Thanks again,

    Jason


    Jason Long

    Friday, February 27, 2015 6:29 AM
  • ...

    It’s not included here, but I did notice that the NUMERIC data type doesn’t work at all. From what I can tell, SS doesn't like casting the binary value back to NUMERIC (I didn't test DECIMAL).


    --Original http://sqlmag.com/t-sql/last-non-null-puzzle

    The solution works with decimal and numeric with the precision and scale (8,2). Here is the sample:
     If (Object_Id('dbo.T1','U') is not null)
     Drop table dbo.T1;
    
    CREATE TABLE T1
    (
      id INT NOT NULL CONSTRAINT PK_T1 PRIMARY KEY,
      col1 INT NULL,
      col_Numeric Numeric(8,2)  NULL
      ,col_Decimal Decimal(8,2)  NULL
    );
    
     
    
    INSERT INTO dbo.T1(id, col1, col_Numeric,col_Decimal) VALUES
      ( 2, NULL,NULL,NUll),
      ( 3,   10,8.45,3.33),
      ( 5,   -1,9.87,NULL),
      ( 7, NULL, NULL,NULL),
      (11, NULL, 5.55,NULL),
      (13,  -12, NULL,7.77),
      (17, NULL, NULL,NULL),
      (19, NULL, NULL,NULL),
      (23, 1759,1.23,0);
    
      SELECT id, col1,
      CAST(
        SUBSTRING(
          MAX( CAST(id AS BINARY(4)) + CAST(col1 AS BINARY(4)) )
            OVER( ORDER BY id
                  ROWS UNBOUNDED PRECEDING ),
          5, 4)
        AS INT) AS lastval
    
    	, col_Numeric, CAST(
        SUBSTRING(
          MAX( CAST(id AS BINARY(4)) + CAST(col_Numeric AS BINARY(8)) )
            OVER( ORDER BY id
                  ROWS UNBOUNDED PRECEDING ),
          5, 8)
        AS   Numeric(8,2)) AS lastvalNumeric
    
    
    		, col_Decimal, CAST(
        SUBSTRING(
          MAX( CAST(id AS BINARY(4)) + CAST(col_Decimal AS BINARY(8)) )
            OVER( ORDER BY id
                  ROWS UNBOUNDED PRECEDING ),
          5, 8)
        AS   Numeric(8,2)) AS lastvalDecimal
    
    
    
    FROM dbo.T1;


    Friday, February 27, 2015 2:10 PM
  • The solution works with decimal and numeric with the precision and scale (8,2).

    Yes, the solution works for decimal and numeric as long as the varbinary length you convert stores at least as many bytes as the precision of the decimal or numeric value requires.  Since a decimal or numeric requires either 5, 9, 13, or 17 bytes depending on the precision (see

    https://msdn.microsoft.com/en-us/library/ms187746.aspx?f=255&MSPPError=-2147217396

    This solution will work for any precision or scale as long as you use 17 as the length of the varbinary.  For example

     If (Object_Id('dbo.T1','U') is not null)
     Drop table dbo.T1;
     go
    CREATE TABLE T1
    (
      id INT NOT NULL CONSTRAINT PK_T1 PRIMARY KEY,
      col1 INT NULL,
      col_Numeric numeric(38,17)  NULL
      ,col_Decimal Decimal(8,2)  NULL
    );
    
     
    
    INSERT INTO dbo.T1(id, col1, col_Numeric,col_Decimal) VALUES
      ( 2, NULL,NULL,NUll),
      ( 3,   10,312345678901234567.3398701234567,3.33),
      ( 5,   -1,9.87,NULL),
      ( 7, NULL, NULL,NULL),
      (11, NULL, 5.55,NULL),
      (13,  -12, NULL,7.77),
      (17, NULL, NULL,NULL),
      (19, NULL, NULL,NULL),
      (23, 1759,1.23,0);
    
      SELECT id, col1,
      CAST(
        SUBSTRING(
          MAX( CAST(id AS BINARY(4)) + CAST(col1 AS BINARY(4)) )
            OVER( ORDER BY id
                  ROWS UNBOUNDED PRECEDING ),
          5, 4)
        AS INT) AS lastval
    
    	, col_Numeric, CAST(
        SUBSTRING(
          MAX( CAST(id AS BINARY(4)) + CAST(col_Numeric AS BINARY(17)) )
            OVER( ORDER BY id
                  ROWS UNBOUNDED PRECEDING ),
          5, 17)
        AS   numeric(38,17)) AS lastvalNumeric
    
    
    		, col_Decimal, CAST(
        SUBSTRING(
          MAX( CAST(id AS BINARY(4)) + CAST(col_Decimal AS BINARY(17)) )
            OVER( ORDER BY id
                  ROWS UNBOUNDED PRECEDING ),
          5, 17)
        AS   Numeric(8,2)) AS lastvalDecimal
    
    
    
    FROM dbo.T1;

    However, Itzik's solution cannot be used for float or real because you cannot use CAST (or CONVERT) to convert varbinary to either float or real.

    Tom

    Friday, February 27, 2015 4:01 PM
  • Nice catch! I stand corrected...

    In my initial tests I was trying to use the storage bytes as opposed to the precision number for the binary number.

    2nd round of testing, using the BINARY(n) went much better...

    IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp
    CREATE TABLE #Temp (
    	ID INT,
    	Numeric84 NUMERIC(8,4),
    	Numeric194 NUMERIC(19,4)
    	)
    INSERT #Temp (ID,numeric84,Numeric194) 
    
    SELECT TOP 300
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS ID,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (1 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 1.0) / 11)) AS Numeric84,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (100000000000 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 100000000000.0) / 11)) AS Numeric194
    FROM 
    	sys.all_objects
    
    UPDATE t SET Numeric84 = NULL, Numeric194 = NULL
    FROM #Temp t WHERE t.ID % 4 IN (2,3)
    
    SELECT
    	t.ID,
    
    	t.Numeric84,
    	CAST(SUBSTRING(MAX(CAST(t.ID AS BINARY(4)) + CAST(t.Numeric84 AS BINARY(8))) OVER (ORDER BY t.ID ROWS UNBOUNDED PRECEDING), 5, 8) AS NUMERIC(8,4)) AS Numeric84_Fill,
    
    	t.Numeric194,
    	CAST(SUBSTRING(MAX(CAST(t.ID AS BINARY(4)) + CAST(t.Numeric194 AS BINARY(19))) OVER (ORDER BY t.ID ROWS UNBOUNDED PRECEDING), 5, 19) AS NUMERIC(19,4)) AS Numeric194_Fill
    FROM
    	#Temp t
    Thank you once again Jingyang. :)


    Jason Long


    Friday, February 27, 2015 4:12 PM
  • Great info Tom!

    I do, however, believe that the following isn't 100% correct...

    "This solution will work for any precision or scale as long as you use 17 as the length of the varbinary."

    My testing showed that using the precision number for the length of the binary works every time but using the storage bytes can lead to truncation of the number and incorrect results. Take a look at the last 2 columns...

    IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp
    CREATE TABLE #Temp (
    	ID INT,
    	Numeric84 NUMERIC(8,4),
    	Numeric194 NUMERIC(19,4),
    	Numeric384 NUMERIC(38,4)
    	)
    INSERT #Temp (ID,numeric84,Numeric194,Numeric384) 
    
    SELECT TOP 300
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS ID,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (1 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 1.0) / 11)) AS Numeric84,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (100000000000 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 100000000000.0) / 11)) AS Numeric194,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (10000000000000000000000000000 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 10000000000000000000000000000.0) / 11)) AS Numeric384
    FROM 
    	sys.all_objects
    
    UPDATE t SET Numeric84 = NULL, Numeric194 = NULL, Numeric384 = NULL
    FROM #Temp t WHERE t.ID % 4 IN (2,3)
    
    INSERT #Temp (ID,numeric84,Numeric194, Numeric384) VALUES (301, NULL, 999999999999999.9999, 999999999999999999999999999999999.9999)
    
    SELECT
    	t.ID,
    
    	t.Numeric84,
    	CAST(SUBSTRING(MAX(CAST(t.ID AS BINARY(4)) + CAST(t.Numeric84 AS BINARY(8))) OVER (ORDER BY t.ID ROWS UNBOUNDED PRECEDING), 5, 8) AS NUMERIC(8,4)) AS Numeric84_Fill,
    
    	t.Numeric194,
    	CAST(SUBSTRING(MAX(CAST(t.ID AS BINARY(4)) + CAST(t.Numeric194 AS BINARY(17))) OVER (ORDER BY t.ID ROWS UNBOUNDED PRECEDING), 5, 17) AS NUMERIC(19,4)) AS Numeric194_Fill,
    
    	t.Numeric384,
    	CAST(SUBSTRING(MAX(CAST(t.ID AS BINARY(4)) + CAST(t.Numeric384 AS BINARY(17))) OVER (ORDER BY t.ID ROWS UNBOUNDED PRECEDING), 5, 17) AS NUMERIC(38,4)) AS Numeric384_17_Fill,
    	CAST(SUBSTRING(MAX(CAST(t.ID AS BINARY(4)) + CAST(t.Numeric384 AS BINARY(38))) OVER (ORDER BY t.ID ROWS UNBOUNDED PRECEDING), 5, 38) AS NUMERIC(38,4)) AS Numeric384_38_Fill
    FROM
    	#Temp t
    

    That said, while I did notice that using the storage bytes could cause truncation, it wasn't necessary to use the full precision number either (when the precision is greater than 9).

    Basically, I'm not sure how to correctly identify the absolute minimum safe BINARY(n) for a given NUMERIC/DECIMAL but considering the lack of cost and the infrequency of actual use, I think using the full precision (n) is a good, all-around, safe, rule-of-thumb.

    Thanks,

    Jason


    Jason Long

    Friday, February 27, 2015 4:54 PM
  • Like Tom pointed out, we cannot to convert directly from float/real to binary. I don't know whether using  a conversion hack will get accurate result or not. 

    I give it a try to get something:

     If (Object_Id('dbo.T1','U') is not null)
     Drop table dbo.T1;
    
    CREATE TABLE T1
    (
      id INT NOT NULL CONSTRAINT PK_T1 PRIMARY KEY,
      col1 INT NULL,
      col_Float float  NULL
    
    );
    
     
    
    INSERT INTO dbo.T1(id, col1, col_Float) VALUES
      ( 2, NULL,NULL),
      ( 3,   10,8.45),
      ( 5,   -1,9.87),
      ( 7, NULL, NULL),
      (11, NULL, 5.5512345),
      (13,  -12, NULL),
      (17, NULL, NULL),
      (19, NULL, NULL),
      (23, 1759,1.23);
    
      SELECT id, col_float, 
      CAST(Cast(SUBSTRING(
          MAX( CAST(id AS BINARY(4)) + Cast(Cast(col_float as decimal(38,17)) as  BINARY(38)) ) 
            OVER( ORDER BY id
                  ROWS UNBOUNDED PRECEDING ),
          5, 38)
        AS  decimal(38,17) ) as float ) AS lastvalFloat
    
    
    	 
    
    
    FROM dbo.T1;

    Friday, February 27, 2015 7:17 PM
  • That only works if all the values in the float column can fit in your conversion to decimal(38,17) or decimal(whatever) that you use.  And you can't be sure of that since real and float can contain values that will not fit in any decimal datatype no matter what you specify for the precision and scale.  For example, the following will fail with the error "Arithmetic overflow error converting float to data type numeric."

         If (Object_Id('dbo.T1','U') is not null)
         Drop table dbo.T1;
    go
        CREATE TABLE T1
        (
          id INT NOT NULL CONSTRAINT PK_T1 PRIMARY KEY,
          col1 INT NULL,
          col_Float float  NULL
        );
    
        INSERT INTO dbo.T1(id, col1, col_Float) VALUES
          ( 2, NULL,NULL),
          ( 3,   10,8.45),
          ( 5,   -1,9.87),
          ( 7, NULL, NULL),
          (11, NULL, 1.2345E53),
          (13,  -12, NULL),
          (17, NULL, NULL),
          (19, NULL, NULL),
          (23, 1759,1.23);
    
          SELECT id, col_float, 
          CAST(Cast(SUBSTRING(
              MAX( CAST(id AS BINARY(4)) + Cast(Cast(col_float as decimal(38,17)) as  BINARY(38)) ) 
                OVER( ORDER BY id
                      ROWS UNBOUNDED PRECEDING ),
              5, 38)
            AS  decimal(38,17) ) as float ) AS lastvalFloat
        FROM dbo.T1;
    

    Tom

    Saturday, February 28, 2015 3:31 AM
  • Thanks, Jason.

    You are right, 17 is not long enough to convert the a value of 1090909090909090909090909090944444.0909 contained in a decimal(38,4).  For some reason it takes 18 bytes when converted to binary.  And the value 1090.90909090909090909090909090909 stored in a decimal(38,34) takes 20 bytes.

    I was assuming that if a decimal(38,4) is stored in 17 bytes (which it is) then converting it to binary would also take 17 bytes, but SQL is clearly not doing that.  I think you are probably correct that using the precision as the length will work in all cases, but I don't know of any documentation that specifies that.

    In fact, the documentation for the CAST/CONVERT function in BOL says "SQL Server does not guarantee that the result of a decimal or numeric data type conversion to binary will be the same between versions of SQL Server."  So maybe even if you could be sure that some specified length always worked in the current version, it seems there is no guarantee it will work in the next version.

    Tom

    Saturday, February 28, 2015 3:56 AM
  • With regards to the float data type, I think it'a a purely academic question. It's a pretty rarely used data type compared to the other numeric data types.

    If, on the off chance, I actually did have to perform this exercise with a FLOAT data type, I'd just use a different method. The following for example works with out the need to convert to BINARY... (see https://www.simple-talk.com/sql/t-sql-programming/filling-in-missing-values-using-the-t-sql-window-frame/)

    IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp
    CREATE TABLE #Temp (
    	ID INT,
    	Numeric84 NUMERIC(8,4),
    	Numeric194 NUMERIC(19,4),
    	Numeric384 NUMERIC(38,4),
    	NumFloatSmall FLOAT,
    	NumFloatBig FLOAT
    	)
    INSERT #temp (ID,numeric84,Numeric194,Numeric384,NumFloatSmall,NumFloatBig) 
    
    SELECT TOP 300
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS ID,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (1 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 1.0) / 11)) AS Numeric84,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (100000000000 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 100000000000.0) / 11)) AS Numeric194,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (10000000000000000000000000000 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 10000000000000000000000000000.0) / 11)) AS Numeric384,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (1 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 1.0) / 11)) AS FloatSmall,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (10000000000000000000000000000 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 10000000000000000000000000000.0) / 11)) AS NumFloatBig
    FROM 
    	sys.all_objects
    
    UPDATE t SET Numeric84 = NULL, Numeric194 = NULL, Numeric384 = NULL
    FROM #Temp t WHERE t.ID % 4 IN (2,3)
    
    
    SELECT 
    	a.ID,
    	a.Numeric84,
    	MAX(a.Numeric84) OVER (PARTITION BY a.c84) AS Smear84,
    	a.Numeric194,
    	MAX(a.Numeric194) OVER (PARTITION BY a.c194) AS Smear194,	
    	a.Numeric384,
    	MAX(a.Numeric384) OVER (PARTITION BY a.c384) AS Smear384,
    	a.NumFloatSmall,
    	MAX(a.NumFloatSmall) OVER (PARTITION BY a.cfs) AS SmearFloat,
    	a.NumFloatBig,
    	MAX(a.NumFloatBig) OVER (PARTITION BY a.cfb) AS SmearFloat
    FROM
    (
        SELECT
    		t.ID,
    		t.Numeric84,
    		t.Numeric194,
    		t.Numeric384,
    		t.NumFloatSmall,
    		t.NumFloatBig,
    		COUNT(t.Numeric84) OVER (ORDER BY t.ID) AS c84,
    		COUNT(t.Numeric194) OVER (ORDER BY t.ID) AS c194,
    		COUNT(t.Numeric384) OVER (ORDER BY t.ID) AS c384,
    		COUNT(t.NumFloatSmall) OVER (ORDER BY t.ID) AS cfs,
    		COUNT(t.NumFloatBig) OVER (ORDER BY t.ID) AS cfb
        FROM #Temp AS t
    ) a
    ORDER BY ID;

    It's not nearly as performant as Itzik's method but it's not horrible either.

    If you're wanting to do an actual update the following works as well...

    IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp
    CREATE TABLE #Temp (
    	ID INT PRIMARY KEY,
    	Numeric84 NUMERIC(8,4),
    	Numeric194 NUMERIC(19,4),
    	Numeric384 NUMERIC(38,4),
    	NumFloatSmall FLOAT,
    	NumFloatBig FLOAT
    	)
    INSERT #temp (ID,numeric84,Numeric194,Numeric384,NumFloatSmall,NumFloatBig) 
    
    SELECT TOP 300
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS ID,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (1 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 1.0) / 11)) AS Numeric84,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (100000000000 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 100000000000.0) / 11)) AS Numeric194,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (10000000000000000000000000000 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 10000000000000000000000000000.0) / 11)) AS Numeric384,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (1 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 1.0) / 11)) AS FloatSmall,
    	ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (10000000000000000000000000000 + ((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 10000000000000000000000000000.0) / 11)) AS NumFloatBig
    FROM 
    	sys.all_objects
    
    UPDATE t SET Numeric84 = NULL, Numeric194 = NULL, Numeric384 = NULL
    FROM #Temp t WHERE t.ID % 4 IN (2,3)
    
    
    DECLARE 
    	@Numeric84 NUMERIC(8,4),
    	@Numeric194 NUMERIC(19,4),
    	@Numeric384 NUMERIC(38,4),
    	@NumFloatSmall FLOAT,
    	@NumFloatBig FLOAT
    
    UPDATE t SET 
    	@Numeric84 = t.Numeric84 = COALESCE(t.Numeric84, @Numeric84),
    	@Numeric194 = t.Numeric194 = COALESCE(t.Numeric194, @Numeric194),
    	@Numeric384 = t.Numeric384 = COALESCE(t.Numeric384, @Numeric384),
    	@NumFloatSmall = t.NumFloatSmall = COALESCE(t.NumFloatSmall, @NumFloatSmall),
    	@NumFloatBig = t.NumFloatBig = COALESCE(t.NumFloatBig, @NumFloatBig)
    FROM 
    	#Temp AS t
    
    SELECT * FROM #Temp AS t

    Note that I added a primary key to the table... The above method relies solely on the the clustered index to control the direction of the smear. It also avoids the need to cast into and out of binary...

    Again, not quite as performant as Itzik's method but a good deal better than the previous method. (This is the solutionI ended up using before Jingyang and Erland brought Itzik's method to light.)


    Jason Long




    • Edited by Jason A Long Saturday, February 28, 2015 6:19 PM
    Saturday, February 28, 2015 6:09 PM