none
grouping problem RRS feed

  • Question

  • I have a table which resembles something like this

    Date | Price1 | Price2 | Price3
    01 Jan 2006 | 100 | 100 | 100
    02 Jan 2006 | 100 | 100 | 100
    03 Jan 2006 | 100 | 100 | 100
    04 Jan 2006 | 115 | 100 | 100
    05 Jan 2006 | 115 | 100 | 100
    06 Jan 2006 | 115 | 115 | 115
    07 Jan 2006 | 115 | 100 | 100
    08 Jan 2006 | 100 | 100 | 100
    09 Jan 2006 | 100 | 100 | 100

    and I want to write a query/view that will return this

    >From | To | Price1 | Price2 | Price3
    01 Jan 2006 | 03 Jan 2006 | 100 | 100 | 100
    04 Jan 2006 | 05 Jan 2006 | 115 | 100 | 100
    06 Jan 2006 | 06 Jan 2006 | 115 | 115 | 115
    07 Jan 2006 | 07 Jan 2006 | 115 | 100 | 100
    08 Jan 2006 | 09 Jan 2006 | 100 | 100 | 100

    Any ideas?
    Saturday, February 19, 2011 10:58 PM

Answers

  • This is known as finding islands (no gaps) in your data. If you enumerate the rows based on the values of (price1, price2, price3), ordered by the [date] and you substract that number as days from the [date], then you will have a grouping value or anchor date for each sequence. The rest is to calculate the min and max [date] for each group (price1, price2, price3, grp).

    USE tempdb;
    GO
    DECLARE @T TABLE (
    [date] date NOT NULL,
    price1 money NOT NULL,
    price2 money NOT NULL,
    price3 money NOT NULL
    );
    
    INSERT INTO @T ([date], price1, price2, price3)
    VALUES
      ('01 Jan 2006', 100, 100, 100),
    		('02 Jan 2006', 100, 100, 100),
    		('03 Jan 2006', 100, 100, 100),
    		('04 Jan 2006', 115, 100, 100),
    		('05 Jan 2006', 115, 100, 100),
    		('06 Jan 2006', 115, 115, 115),
    		('07 Jan 2006', 115, 100, 100),
    		('08 Jan 2006', 100, 100, 100),
    		('09 Jan 2006', 100, 100, 100);
    
    WITH rs AS (
    SELECT
    	[date],
    	price1,
    	price2,
    	price3,
    	DATEADD([day], -1 * ROW_NUMBER() OVER(PARTITION BY price1, price2, price3 ORDER BY [date]), [date]) AS grp
    FROM
    	@T
    )
    SELECT
    	MIN([date]) AS date_from,
    	MAX([date]) AS date_to,
    	price1, price2, price3
    FROM
    	rs
    GROUP BY
    	price1, price2, price3,
    	grp
    ORDER BY
    	date_from
    GO
    

    You can learn more about finding islands in the last book by Itzik Ben-Gan, about T-SQL Querying.

    Inside Microsoft SQL Server 2008: T-SQL Querying

     


    AMB

    Some guidelines for posting questions...

    Saturday, February 19, 2011 11:31 PM
    Moderator

All replies

  • I have a table which resembles something like this

    Date | Price1 | Price2 | Price3
    01 Jan 2006 | 100 | 100 | 100
    02 Jan 2006 | 100 | 100 | 100
    03 Jan 2006 | 100 | 100 | 100
    04 Jan 2006 | 115 | 100 | 100
    05 Jan 2006 | 115 | 100 | 100
    06 Jan 2006 | 115 | 115 | 115
    07 Jan 2006 | 115 | 100 | 100
    08 Jan 2006 | 100 | 100 | 100
    09 Jan 2006 | 100 | 100 | 100

    and I want to write a query/view that will return this

    >From | To | Price1 | Price2 | Price3
    01 Jan 2006 | 03 Jan 2006 | 100 | 100 | 100
    04 Jan 2006 | 05 Jan 2006 | 115 | 100 | 100
    06 Jan 2006 | 06 Jan 2006 | 115 | 115 | 115
    07 Jan 2006 | 07 Jan 2006 | 115 | 100 | 100
    08 Jan 2006 | 09 Jan 2006 | 100 | 100 | 100

    Any ideas?
    • Merged by Ai-hua Qiu Tuesday, February 22, 2011 6:21 AM
    Saturday, February 19, 2011 10:56 PM
  • This is known as finding islands (no gaps) in your data. If you enumerate the rows based on the values of (price1, price2, price3), ordered by the [date] and you substract that number as days from the [date], then you will have a grouping value or anchor date for each sequence. The rest is to calculate the min and max [date] for each group (price1, price2, price3, grp).

    USE tempdb;
    GO
    DECLARE @T TABLE (
    [date] date NOT NULL,
    price1 money NOT NULL,
    price2 money NOT NULL,
    price3 money NOT NULL
    );
    
    INSERT INTO @T ([date], price1, price2, price3)
    VALUES
      ('01 Jan 2006', 100, 100, 100),
    		('02 Jan 2006', 100, 100, 100),
    		('03 Jan 2006', 100, 100, 100),
    		('04 Jan 2006', 115, 100, 100),
    		('05 Jan 2006', 115, 100, 100),
    		('06 Jan 2006', 115, 115, 115),
    		('07 Jan 2006', 115, 100, 100),
    		('08 Jan 2006', 100, 100, 100),
    		('09 Jan 2006', 100, 100, 100);
    
    WITH rs AS (
    SELECT
    	[date],
    	price1,
    	price2,
    	price3,
    	DATEADD([day], -1 * ROW_NUMBER() OVER(PARTITION BY price1, price2, price3 ORDER BY [date]), [date]) AS grp
    FROM
    	@T
    )
    SELECT
    	MIN([date]) AS date_from,
    	MAX([date]) AS date_to,
    	price1, price2, price3
    FROM
    	rs
    GROUP BY
    	price1, price2, price3,
    	grp
    ORDER BY
    	date_from
    GO
    

    You can learn more about finding islands in the last book by Itzik Ben-Gan, about T-SQL Querying.

    Inside Microsoft SQL Server 2008: T-SQL Querying

     


    AMB

    Some guidelines for posting questions...

    Saturday, February 19, 2011 11:31 PM
    Moderator
  • By the way I am reading SQL Server Deep Dives and it has this very example...with different ways to do it and performances of each based on 10,000,000 rows.  Great book..
    Friday, April 15, 2011 6:02 PM