locked
Select blocks of continous data RRS feed

  • Question

  • Hi,

    I have this table:

    ID        Value
    ---------------------
    1            27
    1            28
    1            29
    1            28
    1            28
    1            30
    2            25
    2            26
    2            28
    2            29
    2            28
    2            28
    2            31
    2            34

    So I need to show every block where 28,29,28,28 is continous, this is the desired output:

    ID        Value
    ---------------------

    1            28
    1            29
    1            28
    1            28
    2            28
    2            29
    2            28
    2            28

    Any ideas?

    Regards



    G.Waters


    Friday, March 16, 2018 12:22 AM

Answers

  • It can be done

    But order is not guaranteed always unless you've another column which can be reliably used for specifying the order by

    See this illustration

    declare @t table
    (
    ID  int,
    [Value] varchar(100)
    )
    
    INSERT @t
    VALUES
    (1,            27),
    (1,            28),
    (1,            29),
    (1,            28),
    (1,            28),
    (1,            30),
    (2,            25),
    (2,           26),
    (2,             28),
    (2,             29),
    (2,             28),
    (2,            28),
    (2,             31),
    (2,             34)
    
    ;With CTE
    AS
    (
    SELECT ID,[Value],
    ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) AS Seq
    FROM @t t
    ),CTE1
    AS
    (
    SELECT *,
    STUFF((SELECT ','+CAST([Value] AS varchar(10)) FROM CTE WHERE ID = c.ID AND Seq BETWEEN c.Seq + 1 AND c.Seq + 4 FOR XML PATH('')),1,1,'') AS Str,
    STUFF((SELECT ','+CAST(Seq AS varchar(10)) FROM CTE WHERE ID = c.ID AND Seq BETWEEN c.Seq + 1 AND c.Seq + 4 FOR XML PATH('')),1,1,'') AS SStr
    FROM CTE c
    )
    SELECT c.*
    FROM CTE1 c1
    CROSS APPLY dbo.ParseValues(c1.SStr,',') f
    JOIN CTE c
    ON c.Seq = f.Val
    AND c.ID = c1.ID
    WHERE c1.Str = '28,29,28,28'
    ORDER BY c.ID,c.Seq
    
    
    
    /*
    output
    ---------------------------------------
    ID	Value	
    ---------------------------------------
    1	28	
    1	29	
    1	28	
    1	28	
    2	28	
    2	29	
    2	28	
    2	28	
    
    */

    The UDF parseValues used in the above  code can be found here

    https://visakhm.blogspot.com/2010/02/parsing-delimited-string.html

    If it SQL 2016 and above you dont need to use UDF

    you can use standard function String_Split instead 

    https://docs.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql

    the illustration would be as below

    declare @t table
    (
    ID  int,
    [Value] varchar(100)
    )
    
    INSERT @t
    VALUES
    (1,            27),
    (1,            28),
    (1,            29),
    (1,            28),
    (1,            28),
    (1,            30),
    (2,            25),
    (2,           26),
    (2,             28),
    (2,             29),
    (2,             28),
    (2,            28),
    (2,             31),
    (2,             34)
    
    ;With CTE
    AS
    (
    SELECT ID,[Value],
    ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) AS Seq
    FROM @t t
    ),CTE1
    AS
    (
    SELECT *,
    STUFF((SELECT ','+CAST([Value] AS varchar(10)) FROM CTE WHERE ID = c.ID AND Seq BETWEEN c.Seq + 1 AND c.Seq + 4 FOR XML PATH('')),1,1,'') AS Str,
    STUFF((SELECT ','+CAST(Seq AS varchar(10)) FROM CTE WHERE ID = c.ID AND Seq BETWEEN c.Seq + 1 AND c.Seq + 4 FOR XML PATH('')),1,1,'') AS SStr
    FROM CTE c
    )
    SELECT c.*
    FROM CTE1 c1
    CROSS APPLY String_Split(c1.SStr,',') f
    JOIN CTE c
    ON c.Seq = f.value
    AND c.ID = c1.ID
    WHERE c1.Str = '28,29,28,28'
    ORDER BY c.ID,c.Seq


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    • Marked as answer by George Waters Monday, March 19, 2018 7:18 AM
    Sunday, March 18, 2018 6:32 AM

All replies

  • A table is by definition an unordered set of rows so there is no notion of contiguous. One can often use windowing functions with ORDER BY to solve similar problems but the sample data here has duplicate rows. If you have an incremental column value not shown in the sample data, it might be possible to use that to apply the ordering needed to identify "contiguous" blocks. 


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com

    Friday, March 16, 2018 1:16 AM
  • Well, I am positive that this can be done, I am asking help because I haven't found an example with this conditions.

    G.Waters

    Friday, March 16, 2018 9:10 PM
  • You miss a basic point of a table. There is no order for a set. That may be the reason you cannot find an example. But it you do have another column can define the sequence, for example, timestamp, you may be able to find a query.
    Friday, March 16, 2018 9:31 PM
  • Well, I am positive that this can be done, I am asking help because I haven't found an example with this conditions.

    I agree the "contiguous data" problem can be solved, just not using only the sample data provided. Is that what you mean by "this conditions"?


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com

    Friday, March 16, 2018 11:26 PM
  • Good day George,

    Seems like you did not understood the basic point which both Dan explained. I will try to give an addition to the explanation...

    When you get data (select) from SQL Server Table the server brings the data not ordered necessarily, unless you use "ORDER BY". In this case, the server returns the data according to way it is stored on disk and the way it is read.

    For example if your server uses more then one CPU then it might bring the data using multiple reading, if the table does not have clustered index then the order the rows stored on disk might be different then the order you expects to get it, if the server does not have clustered index and you deleted a row and then insert new row then there is option then the new row toke the place of the old row.... and so on...

    Here is a simple example which USUALLY works (again usually since the order is not guarantee)

    drop table if exists T;
    create table T(id int, txt nvarchar(4000))
    GO
    
    insert T(id, txt) values 
    	(2,REPLICATE('a',4000)),(4,REPLICATE('a',4000)),(6,REPLICATE('a',4000))
    	,(8,REPLICATE('a',4000)),(1,REPLICATE('a',4000)),(3,REPLICATE('a',4000))
    	,(5,REPLICATE('a',4000)),(7,REPLICATE('a',4000))
    GO
    
    -- we delete 3 rows
    delete T where id in (1,2,3)
    GO
    
    -- and inert new 3 rows... do you think the new rows will be in the end of the data?!?
    insert T(id, txt) values 
    	(11,REPLICATE('a',4000)),(22,REPLICATE('a',4000)),(33,REPLICATE('a',4000))
    GO
    
    -- Clear buffer
    DBCC DROPCLEANBUFFERS
    GO
    
    select id from T
    GO
    /*
    id
    11
    4
    6
    8
    22
    33
    5
    7
    */
    

    * you can notice (usually) that the three rows I added at the end does not return at the end since (usually) sql server found that he have place in the disk where other rows were deleted, and since the size fit the new rows, then the server add the new rows NOT IN THE END ;-)

    Unless you use ORDER BY the order is not guarantee :-)

    In your sample you have value 28 before value 29 and after, but what is the different between these rows?!? Why do you assume that it will come in this order?!?

    * in new none-clustered indexed table with new data, which is inserted in one query, and there is no delete, then usually, BUT NOT NECESSARILY the data is organized (stored on disk) by the order you insert it. This lead some newbies to make assumption that the order of the data on the disk is always the same as the order we insert it, which is wrong, as you can see in my demo

    Unless you give us more information and change the structure of the table, Dan gave you the answer to your question


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]

    Sunday, March 18, 2018 6:10 AM
  • It can be done

    But order is not guaranteed always unless you've another column which can be reliably used for specifying the order by

    See this illustration

    declare @t table
    (
    ID  int,
    [Value] varchar(100)
    )
    
    INSERT @t
    VALUES
    (1,            27),
    (1,            28),
    (1,            29),
    (1,            28),
    (1,            28),
    (1,            30),
    (2,            25),
    (2,           26),
    (2,             28),
    (2,             29),
    (2,             28),
    (2,            28),
    (2,             31),
    (2,             34)
    
    ;With CTE
    AS
    (
    SELECT ID,[Value],
    ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) AS Seq
    FROM @t t
    ),CTE1
    AS
    (
    SELECT *,
    STUFF((SELECT ','+CAST([Value] AS varchar(10)) FROM CTE WHERE ID = c.ID AND Seq BETWEEN c.Seq + 1 AND c.Seq + 4 FOR XML PATH('')),1,1,'') AS Str,
    STUFF((SELECT ','+CAST(Seq AS varchar(10)) FROM CTE WHERE ID = c.ID AND Seq BETWEEN c.Seq + 1 AND c.Seq + 4 FOR XML PATH('')),1,1,'') AS SStr
    FROM CTE c
    )
    SELECT c.*
    FROM CTE1 c1
    CROSS APPLY dbo.ParseValues(c1.SStr,',') f
    JOIN CTE c
    ON c.Seq = f.Val
    AND c.ID = c1.ID
    WHERE c1.Str = '28,29,28,28'
    ORDER BY c.ID,c.Seq
    
    
    
    /*
    output
    ---------------------------------------
    ID	Value	
    ---------------------------------------
    1	28	
    1	29	
    1	28	
    1	28	
    2	28	
    2	29	
    2	28	
    2	28	
    
    */

    The UDF parseValues used in the above  code can be found here

    https://visakhm.blogspot.com/2010/02/parsing-delimited-string.html

    If it SQL 2016 and above you dont need to use UDF

    you can use standard function String_Split instead 

    https://docs.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql

    the illustration would be as below

    declare @t table
    (
    ID  int,
    [Value] varchar(100)
    )
    
    INSERT @t
    VALUES
    (1,            27),
    (1,            28),
    (1,            29),
    (1,            28),
    (1,            28),
    (1,            30),
    (2,            25),
    (2,           26),
    (2,             28),
    (2,             29),
    (2,             28),
    (2,            28),
    (2,             31),
    (2,             34)
    
    ;With CTE
    AS
    (
    SELECT ID,[Value],
    ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) AS Seq
    FROM @t t
    ),CTE1
    AS
    (
    SELECT *,
    STUFF((SELECT ','+CAST([Value] AS varchar(10)) FROM CTE WHERE ID = c.ID AND Seq BETWEEN c.Seq + 1 AND c.Seq + 4 FOR XML PATH('')),1,1,'') AS Str,
    STUFF((SELECT ','+CAST(Seq AS varchar(10)) FROM CTE WHERE ID = c.ID AND Seq BETWEEN c.Seq + 1 AND c.Seq + 4 FOR XML PATH('')),1,1,'') AS SStr
    FROM CTE c
    )
    SELECT c.*
    FROM CTE1 c1
    CROSS APPLY String_Split(c1.SStr,',') f
    JOIN CTE c
    ON c.Seq = f.value
    AND c.ID = c1.ID
    WHERE c1.Str = '28,29,28,28'
    ORDER BY c.ID,c.Seq


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    • Marked as answer by George Waters Monday, March 19, 2018 7:18 AM
    Sunday, March 18, 2018 6:32 AM
  • I knew it could be done !!!

    I appreciate you all your time

    Thanks Visakh !!! I'd buy you a beer if I could !!!

    Regards.


    G.Waters

    Monday, March 19, 2018 7:18 AM
  • I knew it could be done !!!

    I appreciate you all your time

    Thanks Visakh !!! I'd buy you a beer if I could !!!

    Regards.


    G.Waters

    You're welcome 

    Thanks for the offer :)



    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    Monday, March 19, 2018 7:20 AM
  • You should NOT count on getting the same result in any execution!
    seems like you did not understood what we tried to explain ;-(

    It might work 99.99999% of the times, but one time that the server will return the rows in different order you will get a bug in the system!

    In my opinion you are doing mistake if you use this solution, but I am glad that you feel good and feel like you have solve the issue ;-)


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]


    • Edited by pituachMVP Monday, March 19, 2018 7:58 AM
    Monday, March 19, 2018 7:57 AM
  • This is not a solution, instead it is an illusion.

    Sorry to see a post like this one that is misguiding the OP.  We can't talk about this set having contiguous rows without having a set of columns to identify the rows in chronological order.

    Try adding this to the sample data:

    declare @t table
    (
    ID  int,
    [Value] varchar(100),
    sk int NOT NULL IDENTITY,
    PRIMARY KEY (ID, [Value], sk DESC)
    );
    
    -- or in case you are using version 2016 or greater
    declare @t table
    (
    ID  int,
    [Value] varchar(100),
    index idx_nc_ID_Val nonclustered (ID, [Value] DESC)
    );
    

    If the optimizer decides to use the new index then the code suggested as the solution will not work.


    AMB

    Some guidelines for posting questions...

    AYÚDANOS A AYUDARTE, guía básica de consejos para formular preguntas


    Monday, March 19, 2018 12:26 PM
  • I think Visakh was very clear when he said:

    "It can be done
    But order is not guaranteed always unless you've another column which can be reliably used for specifying the order by"

    So, I just added an autonumeric identity field and that's it :)


    G.Waters

    Monday, March 19, 2018 2:38 PM
  • Glad you got that part.  Adding a column with identity property could help but you need to incorporate it as part of the solution.

    Here a couple of suggestions without the needs to use string aggregation.

    declare @t table
    (
    ID  int,
    [Value] varchar(100),
    sk int NOT NULL IDENTITY PRIMARY KEY
    )
    
    INSERT @t(ID, [Value])
    VALUES
    (1,            27),
    (1,            28),
    (1,            29),
    (1,            28),
    (1,            28),
    (1,            30),
    (2,            25),
    (2,           26),
    (2,             28),
    (2,             29),
    (2,             28),
    (2,            28),
    (2,             31),
    (2,             34);
    
    -- Option 1
    WITH R AS (
    SELECT
    	ID,
        [Value],
        sk,
    	LEAD(sk, 3) OVER(PARTITION BY ID ORDER BY sk) AS final_sk,
    	CASE
    	WHEN [Value] = 28
    		AND LEAD([value], 1) OVER(PARTITION BY ID ORDER BY sk) = 29
    		AND LEAD([value], 2) OVER(PARTITION BY ID ORDER BY sk) = 28
    		AND LEAD([value], 3) OVER(PARTITION BY ID ORDER BY sk) = 28 THEN 1
    	ELSE
    		0
    	END AS isBlock
    FROM
    	@T
    )
    SELECT
    	S.*
    FROM
    	R
    	INNER JOIN
            @T AS S
    	ON S.ID = R.ID
    	AND S.sk BETWEEN R.sk AND R.final_sk
    WHERE
    	R.isBlock = 1
    ORDER BY
    	S.ID, S.sk;
    
    -- Option 2
    WITH R AS (
    SELECT
    	ID,
        [Value],
        sk,
    	LEAD(sk, 3) OVER(PARTITION BY ID ORDER BY sk) AS final_sk,
    	CASE
    	WHEN [Value] = 28
    		AND LEAD([value], 1) OVER(PARTITION BY ID ORDER BY sk) = 29
    		AND LEAD([value], 2) OVER(PARTITION BY ID ORDER BY sk) = 28
    		AND LEAD([value], 3) OVER(PARTITION BY ID ORDER BY sk) = 28 THEN 1
    	ELSE
    		0
    	END AS isBlock
    FROM
    	@T
    )
    SELECT
    	S.*
    FROM
    	R
    	CROSS APPLY
    	(
    	SELECT
    		*
    	FROM
    		@T AS T
    	WHERE
    		T.ID = R.ID
    		AND T.sk >= R.sk
    	ORDER BY
    		ID, sk
    	OFFSET 0 ROWS
    	FETCH NEXT 4 ROWS ONLY
    	) AS S
    WHERE
    	R.isBlock = 1
    ORDER BY
    	S.ID, S.sk;


    AMB

    Some guidelines for posting questions...

    AYÚDANOS A AYUDARTE, guía básica de consejos para formular preguntas


    Monday, March 19, 2018 3:08 PM
  • This is not a solution, instead it is an illusion.

    I loved this sentence Hunchback :-)

    it is so true!

    I agree in this case, and I think (IMO) it was better not to post something that will make the user think that it is a solution. It is better to have a query that works only 1% of the time over a query that seems to work 99.9999% since the first case you find the issue and then solve it fast, while in the second case (like the case in this thread) you might not event notice that from time time you get wrong result. It's much harder to find an issue that is rare.

    * for the sake of future users that will come here and will only focus on the response that was marked as answer, I don't feel good that this was marked as an answer, instead of the first response by Dan Guzman which was the right answer...


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]

    Monday, March 19, 2018 11:57 PM
  • Thanks for your examples Hunchback !!! they enrich this post :)

    G.Waters

    Tuesday, March 20, 2018 12:14 AM