locked
Create batches of records, while keeping groups together RRS feed

  • Question

  • Hello,

    I am using SQL 2008R2 and trying to process the data I have in a table in batches, however, there is a catch. The data is broken into groups and, as I do my processing, I have to make sure that a group will always be contained within a batch or, in other words, that the group will never be split across different batches. It's assumed that the batch size will always be much larger than the group size. Here is the setup to illustrate what I mean (the code is using Jeff Moden's data generation logic: http://www.sqlservercentral.com/articles/Data+Generation/87901)

    DECLARE @NumberOfRows INT = 1000,
        @StartValue   INT = 1,
        @EndValue     INT = 500,
        @Range        INT
    
    SET @Range = @EndValue - @StartValue + 1
    
    IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
    DROP TABLE #SomeTestTable;
    
    SELECT TOP (@NumberOfRows)
    GroupID = ABS(CHECKSUM(NEWID())) % @Range + @StartValue
    INTO #SomeTestTable
    FROM sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2

    This will create a table with about 435 groups of records containing between 1 and 7 records in each. Now, let's say I want to process these records in batches of 100 records per batch. How can I make sure that my GroupID's don't get split between different batches? I am fine if each batch is not exactly 100 records, it could be a little more or a little less.

    I appreciate any suggestions!


    Thursday, February 14, 2013 6:37 AM

Answers

  • The following will require a Numbers table.  See http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html for what a Numbers table is and some of their many uses.  But for purposes of this demo we can quickly create a temp table with 16 numbers in it.  (Which is all we need for this since you will only have about 1000 rows which means about 10 batches and for the this purpose we just need a Numbers table with as many numbers in it as we will have batches.  Of course, for your real case you would need a more reasonably sized Numbers table.  With that Numbers table, we can do

    -- Create a Numbers table
    IF OBJECT_ID('tempdb..#Numbers','U') IS NOT NULL
    DROP TABLE #Numbers;
    
    Create Table #Numbers(Number int Primary Key);
    With N2 As (Select 1 As Number Union All Select 1),
    N4 As (Select na.Number From N2 na Cross Join N2),
    N16 As (Select na.Number From N4 na Cross Join N4)
    Insert #Numbers(Number) Select Row_Number() Over(Order By Number) From N16;
    
    -- Create Sample Data
    DECLARE @NumberOfRows INT = 1000,
        @StartValue   INT = 1,
        @EndValue     INT = 500,
        @Range        INT
    
    SET @Range = @EndValue - @StartValue + 1
    
    IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
    DROP TABLE #SomeTestTable;
    
    SELECT TOP (@NumberOfRows)
    GroupID = ABS(CHECKSUM(NEWID())) % @Range + @StartValue
    INTO #SomeTestTable
    FROM sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2
    
    -- Split the Groups into batches, returning the first group,
    -- last group and total number of rows in each batch
    
    ;With cte As
    (Select GroupID, Rank() Over(Order By GroupID) As rn
    From #SomeTestTable t)
    Select n.Number As Batch, Min(c.GroupID) As FirstGroupID, Max(c.GroupID) As LastGroupID, Count(*) As TotalRowsInBatch
    From cte c
    Inner Join #Numbers n On c.rn > 100 * (n.Number - 1) And c.rn <= 100 * n.Number
    Group By n.Number
    Order By Batch;

    Tom

    P.S. Thanks for the code to generate the sample data and for letting us know your SQL Server version.  That is always very helpful.

    • Marked as answer by Michael_SQL Thursday, February 14, 2013 6:10 PM
    Thursday, February 14, 2013 7:33 AM

All replies

  • The following will require a Numbers table.  See http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html for what a Numbers table is and some of their many uses.  But for purposes of this demo we can quickly create a temp table with 16 numbers in it.  (Which is all we need for this since you will only have about 1000 rows which means about 10 batches and for the this purpose we just need a Numbers table with as many numbers in it as we will have batches.  Of course, for your real case you would need a more reasonably sized Numbers table.  With that Numbers table, we can do

    -- Create a Numbers table
    IF OBJECT_ID('tempdb..#Numbers','U') IS NOT NULL
    DROP TABLE #Numbers;
    
    Create Table #Numbers(Number int Primary Key);
    With N2 As (Select 1 As Number Union All Select 1),
    N4 As (Select na.Number From N2 na Cross Join N2),
    N16 As (Select na.Number From N4 na Cross Join N4)
    Insert #Numbers(Number) Select Row_Number() Over(Order By Number) From N16;
    
    -- Create Sample Data
    DECLARE @NumberOfRows INT = 1000,
        @StartValue   INT = 1,
        @EndValue     INT = 500,
        @Range        INT
    
    SET @Range = @EndValue - @StartValue + 1
    
    IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
    DROP TABLE #SomeTestTable;
    
    SELECT TOP (@NumberOfRows)
    GroupID = ABS(CHECKSUM(NEWID())) % @Range + @StartValue
    INTO #SomeTestTable
    FROM sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2
    
    -- Split the Groups into batches, returning the first group,
    -- last group and total number of rows in each batch
    
    ;With cte As
    (Select GroupID, Rank() Over(Order By GroupID) As rn
    From #SomeTestTable t)
    Select n.Number As Batch, Min(c.GroupID) As FirstGroupID, Max(c.GroupID) As LastGroupID, Count(*) As TotalRowsInBatch
    From cte c
    Inner Join #Numbers n On c.rn > 100 * (n.Number - 1) And c.rn <= 100 * n.Number
    Group By n.Number
    Order By Batch;

    Tom

    P.S. Thanks for the code to generate the sample data and for letting us know your SQL Server version.  That is always very helpful.

    • Marked as answer by Michael_SQL Thursday, February 14, 2013 6:10 PM
    Thursday, February 14, 2013 7:33 AM
  • This is great and will help me accomplish what I need.  Thank you very much!
    Thursday, February 14, 2013 6:09 PM