none
Randomize order but group similar items together RRS feed

  • Question

  • I have an odd problem. I'm trying to randomize the order in which the rows return, but I need to keep similar items grouped together so I can go through it properly.

     

    Here's a sample table:

    <pre>ID  |  Content  |  Answer  |  Correct
    -------------------------------------------------
     3  |  4+2?    |   6      |   1
     3  |  4+2?    |   4      |   0
     3  |  4+2?    |   5      |   0
    
     1  |  2+2?    |   4      |   1
     1  |  2+2?    |   3      |   0
     1  |  2+2?    |   5      |   0
    
     2  |  3+2?    |   3      |   0
     2  |  3+2?    |   5      |   1
     2  |  3+2?    |   4      |   0
    
    
    
    It would randomize the order in which the ID's appear and also randomize the order of each answer per ID.

    I hope that makes sense.


    SELECT ID, Content, Answer, Correct FROM table WHERE ID IN ('1', '2', '3') ORDER BY NEWID() jumbles everything, ideally, I'd like to keep them grouped by their ID's, like this:

     

    Friday, October 29, 2010 4:41 PM

Answers

  • Try

    ;with IDs as (select distinct ID from myTable),
    RandomIDs as (select ID, row_number() over (order by NewID()) as RandomID from IDs),
    AllInfo as (select *, 
    
    row_number() over (partition by ID order by NewID()) as RandomOrder from myTable)
    
    select AllInfo.* from AllInfo inner join RandomIDs RI
    on AllInfo.ID = RI.ID 
    order by RI.RandomID, AllInfo.RandomOrder
    
    
    From the top of my head.


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    • Marked as answer by Kurt-K Friday, October 29, 2010 6:19 PM
    Friday, October 29, 2010 5:50 PM
    Moderator

All replies

  • In SQL Server 2005 and up you need to use ROW_NUMBER() function for your purpose, e.g.

     

    select *, 
    
    row_number() over (partition by ID order by NewID()) as RandomOrder 
    
    from Table where ID in (1,2,3) order by ID, RandomOrder
    

     


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Friday, October 29, 2010 4:45 PM
    Moderator
  • Is the purpose of this to customize a quiz each time someone logs onto a application, but to make sure that they get a question from each of several subject areas? 

     If so, look at an additive congruence generator. It is like a RNG, but the numbers are all unique in one cycle. 


    --CELKO-- Please post DDL, use ISO-11179 naming rules and format code so we can read it
    Friday, October 29, 2010 5:26 PM
  • Ok, that randomizes the rows per question, but it doesn't randomize the actual question order. Instead of having it order by ID, such that it'd return 1, 2, 3, 4, etc, I'd rather it randomize that as well, to return: 4, 2, 1, 3, etc.

     

    Thanks for that part, it's definitely helping me.
    Friday, October 29, 2010 5:35 PM
  • Yeah, pretty much. We have, say, 30 questions. We want to select 10 and randomize the order they appear and also the order the answer choices appear. I'd like to try and do this through an SQL query rather than re-write my PHP code to store it all in an array and then go through that randomly.
    Friday, October 29, 2010 5:37 PM
  • Try

    ;with IDs as (select distinct ID from myTable),
    RandomIDs as (select ID, row_number() over (order by NewID()) as RandomID from IDs),
    AllInfo as (select *, 
    
    row_number() over (partition by ID order by NewID()) as RandomOrder from myTable)
    
    select AllInfo.* from AllInfo inner join RandomIDs RI
    on AllInfo.ID = RI.ID 
    order by RI.RandomID, AllInfo.RandomOrder
    
    
    From the top of my head.


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    • Marked as answer by Kurt-K Friday, October 29, 2010 6:19 PM
    Friday, October 29, 2010 5:50 PM
    Moderator
  • Flawless. Can you explain how that works and how you came to this conclusion? I'd rather learn to understand what was done instead of just copy/pasting.

     

    Thanks again.

    Friday, October 29, 2010 6:20 PM
  • Try:

    USE tempdb;
    GO
    DECLARE @T TABLE (
    ID int,
    [Content] varchar(50),
    Answer int,
    Correct bit
    );
    
    INSERT INTO @T VALUES(3, '4+2?', 6, 1);
    INSERT INTO @T VALUES(3, '4+2?', 4, 0);
    INSERT INTO @T VALUES(3, '4+2?', 5, 0);
    INSERT INTO @T VALUES(1, '2+2?', 4, 1);
    INSERT INTO @T VALUES(1, '2+2?', 3, 0);
    INSERT INTO @T VALUES(1, '2+2?', 5, 0);
    INSERT INTO @T VALUES(2, '3+2?', 3, 0);
    INSERT INTO @T VALUES(2, '3+2?', 5, 1);
    INSERT INTO @T VALUES(2, '3+2?', 4, 0);
    
    SELECT
     [@T].*
    FROM
     @T
     CROSS APPLY
     (
     SELECT ABS(CHECKSUM(NEWID())) AS c1
     ) AS R
    ORDER BY
     MIN(R.c1) OVER(PARTITION BY [@T].ID),
     R.c1;
    GO
    

    The idea is to generate a random number per row ABS(...), then ordering by the minimum random number per ID (this will random the order of the IDs but keeping same IDs together) and by the random number itself to random the order of the answers.

     


    AMB

     

    Some guidelines for posting questions...

    Friday, October 29, 2010 6:26 PM
    Moderator
  • I thought your solution is the winner, but on this small set it doesn't look like it:

    set nocount on
    DECLARE @T TABLE (
    ID int,
    [Content] varchar(50),
    Answer int,
    Correct bit
    );
    
    INSERT INTO @T VALUES(3, '4+2?', 6, 1);
    INSERT INTO @T VALUES(3, '4+2?', 4, 0);
    INSERT INTO @T VALUES(3, '4+2?', 5, 0);
    INSERT INTO @T VALUES(1, '2+2?', 4, 1);
    INSERT INTO @T VALUES(1, '2+2?', 3, 0);
    INSERT INTO @T VALUES(1, '2+2?', 5, 0);
    INSERT INTO @T VALUES(2, '3+2?', 3, 0);
    INSERT INTO @T VALUES(2, '3+2?', 5, 1);
    INSERT INTO @T VALUES(2, '3+2?', 4, 0);
    set statistics io on
    print 'Alejandro''s solution'
    SELECT
     [@T].*
    FROM
     @T
     CROSS APPLY
     (
     SELECT ABS(CHECKSUM(NEWID())) AS c1
     ) AS R
    ORDER BY
     MIN(R.c1) OVER(PARTITION BY [@T].ID),
     R.c1;
     
    print 'Naomi''s solution'
    ;with IDs as (select distinct ID from @t),
    RandomIDs as (select ID, row_number() over (order by NewID()) as RandomID from IDs),
    AllInfo as (select *, 
    
    row_number() over (partition by ID order by NewID()) as RandomOrder from @T )
    
    select AllInfo.* from AllInfo inner join RandomIDs RI
    on AllInfo.ID = RI.ID 
    order by RI.RandomID, AllInfo.RandomOrder
    set statistics io off
    

    To OP. I used the straightforward approach. We need to randomize ID and randomize Answers within ID. In order to randomize IDs I selected them into a cte (get unique ID and order them by NewID() so randomly ordered them). Then I did the same for answers within each ID. Then I just joined the randomized ID with the rest of information in order to get RandomID field to use as a sort.

    I'd be interested in your statistics results on a big table.


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Friday, October 29, 2010 6:39 PM
    Moderator
  • Naomi,

    > I thought your solution is the winner, but on this small set it doesn't look like it

    what result are you getting?

    In my box it is sorting both ID group and Answers per ID in a random fashion.

    EXEC 1:

    ID Content Answer Correct
    2 3+2? 5 1
    2 3+2? 4 0
    2 3+2? 3 0
    3 4+2? 5 0
    3 4+2? 6 1
    3 4+2? 4 0
    1 2+2? 4 1
    1 2+2? 5 0
    1 2+2? 3 0

    EXEC 2:

    ID Content Answer Correct
    1 2+2? 5 0
    1 2+2? 4 1
    1 2+2? 3 0
    3 4+2? 4 0
    3 4+2? 6 1
    3 4+2? 5 0
    2 3+2? 3 0
    2 3+2? 5 1
    2 3+2? 4 0

    EXEC 3:

    ID Content Answer Correct
    1 2+2? 3 0
    1 2+2? 5 0
    1 2+2? 4 1
    2 3+2? 3 0
    2 3+2? 4 0
    2 3+2? 5 1
    3 4+2? 4 0
    3 4+2? 6 1
    3 4+2? 5 0

    Still, with a small set the probability of getting same order will be higher.


    AMB

    Some guidelines for posting questions...

    Friday, October 29, 2010 7:02 PM
    Moderator
  • No, I meant in comparison with my solution I thought your solution should perform better. As you see, I added SET STATISTICS IO on to check number of reads/writes and compared both solutions. I understand, it's just a very dirty check and besides, the result set is too small. I just don't have time right now to play with the bigger data set to compare solutions.

    BTW, this problem may help us (our company) as well, as we recently added ability to generate surveys. I'm not sure how we produce them in the main application, though, as I don't write that part.


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Friday, October 29, 2010 7:13 PM
    Moderator
  • Oh, I see. My suggestion has more logical reads, but less query cost relative to the batch. Check the execution plans, see how many sort iterators are in your suggestion, which makes me wonder how it will behave with a table with more significant number of rows.

    I also played adding a primary key (ID, Answer), and the cost even less since the aggregation is partitioned by ID.

    I was playing in the beginning with this query:

    WITH R AS (
    SELECT ID, ABS(CHECKSUM(NEWID())) AS r
    FROM (SELECT DISTINCT ID FROM #T) AS T
    )
    SELECT
     T.*
    FROM
     #T AS T
     INNER JOIN
     R
     ON T.ID = R.ID
    ORDER BY
     R.r,
     ABS(CHECKSUM(NEWID()));
    GO
    

    But to my surprise, the optimizer is pushing both computations of ABS(...) after joining the tables (original and derived) and that screw the whole thing, since now I do not have a unique value per ID. See by yourself.

     |--Sort(ORDER BY:([Expr1007] ASC, [Expr1008] ASC))
      |--Compute Scalar(DEFINE:([Expr1007]=abs(checksum(newid())), [Expr1008]=abs(checksum(newid()))))
       |--Merge Join(Inner Join, MERGE:([tempdb].[dbo].[#T].[ID])=([T].[ID]), RESIDUAL:([tempdb].[dbo].[#T].[ID]=[tempdb].[dbo].[#T].[ID] as [T].[ID]))
         |--Sort(DISTINCT ORDER BY:([tempdb].[dbo].[#T].[ID] ASC))
         | |--Table Scan(OBJECT:([tempdb].[dbo].[#T]))
         |--Sort(ORDER BY:([T].[ID] ASC))
          |--Table Scan(OBJECT:([tempdb].[dbo].[#T] AS [T]))

    Anyway, one more opportunity to learn.

     


    AMB

    Some guidelines for posting questions...

    Friday, October 29, 2010 7:36 PM
    Moderator
  • I got another approach from Itzik Ben-Gan with the help of Erland Sommarskog, that yields better random distribution.

    SELECT
     *
    FROM
     @T
    ORDER BY
     RAND(1. * DATEPART(microsecond, SYSDATETIME()) * ID % 2147483648),
     ABS(CHECKSUM(NEWID()));
    GO
    The idea is to use column [ID] to generate diferent seed per each distinct value, and force the optimizer to evaluate RAND function per each row.

    It passed over my head the first time. Itzik corrected me one more time, until I got it.

    This is a clever solution indeed. 


    AMB

    Some guidelines for posting questions...

    Saturday, October 30, 2010 5:43 PM
    Moderator
  • I'm not sure if I should bump this topic or create a new one, but I have another request for this.

     

    My table has a field, ParentID, that allows us to link to the translated question. For example:

     

    ID | Content | Answer | Correct | ParentID

    ----------------------------------------------------

    1 | Hello | Yes | 1 | NULL

    222 | Hola | Si | 1 | 1

     

    The Spanish version of the question's ParentID is that of the English one. The English question has no parent, as it is the parent. What I'm trying to do is make it easy for us to swap the questions out (hide/unhide a <div>).  I've tried to modify the code but so far I've failed. My only successful attempt took 1 minute and 28 seconds for the query to finish, and the results were not kept in the proper order. Instead of rewriting my PHP code (since I built this using test data), I'd rather try to modify the query if possible.

     

    with IDs as (select TOP 10 s.IN_QuestionID as en, q.IN_QuestionID as sp from tbl_standard as s, tbl_question as q WHERE s.IN_StandardID = 17 AND s.IN_LessonID = 5170 AND q.ParentID = s.IN_QuestionID),
            RandomIDs as (select en, sp, row_number() over (order by NewID()) as RandomID from IDs),
            AllInfo as (select q.IN_QuestionID, q.VC_QuestionContent, a.VC_AnswerContent, a.IN_AnswerCorrect, q.ParentID, a.IN_AnswerID,
            row_number() over (partition by q.IN_QuestionID order by NewID()) as RandomOrder FROM tbl_Question as q, tbl_Answer as a
            WHERE q.IN_QuestionID = a.IN_QuestionID
            )
    
    select AllInfo.* from AllInfo 
    nner join RandomIDs RI
    ON AllInfo.IN_QuestionID = RI.en OR AllInfo.IN_QuestionID = RI.sp
    ORDER BY RI.RandomID, AllInfo.RandomOrder
    

    I must be missing something simple here, but I've been pulling my hair out trying to figure it out. If it's too taxing then I'll have to change my PHP code over to load external content based on the selected language.

     

    Thanks.

    Thursday, December 9, 2010 4:56 PM
  • I think you may start a new thread with a reference to this one. You probably want to get 10 random English questions and separately corresponding Spanish questions (the same 10 random? or different 10 random)? Then you will be able to show either English or Spanish depending on the language chosen.
    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Thursday, December 9, 2010 5:08 PM
    Moderator
  • I'd want 10 English and 10 Spanish. The way I have it in code is to see if the row selected has a parentID, if it does, then it goes into the Spanish string and vice versa for English. I'll start a new topic if you wish, or we may continue here. I just don't know what the rules and etiquette are for the boards.
    Thursday, December 9, 2010 5:21 PM
  • Let's start another one. Just a question: does PHP allow to return multiple result sets (as in .NET you can return two separate result sets - one English, one Spanish). If that being the case, you select 10 random English questions and then find 10 corresponding Spanish (or if you don't want them to correspond, you get 10 random Spanish as well). And then simply process separate data sets.
    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Thursday, December 9, 2010 5:26 PM
    Moderator
  • I'm not sure, to be honest. I'd want the Spanish questions to match the English ones. I just process the entire data set and check the parentID field.

     

    Edit: Continued here: http://social.msdn.microsoft.com/Forums/en/transactsql/thread/44869b1f-dd81-42dd-a88e-708f3fd68fa5

    Thursday, December 9, 2010 5:51 PM