subset or hierarchy from same column

Respondido subset or hierarchy from same column

  • quinta-feira, 30 de agosto de 2012 21:15
     
     
    id pid ctext
    1 1 A         
    2 1 B         
    3 1 C         
    4 1 D         
    5 1 E         
    6 1 F         
    7 1 G         
    8 1 I         
    9 1 J         
    10 1 K         
    11 2 A         
    12 2 B         
    13 2 C         
    14 2 D         
    15 2 E         
    16 2 F         
    17 3 A         
    18 3 B         
    19 3 C         
    20 3 D         
    21 3 Z         
    22 4 J         
    23 4 K         
    24 4 L         
    25 4 M         

    Hi, I have the above table and I need to find if a pid fully not partially contains ctext of any other pid. For example. pid 1 has ctext from A until K, ctext from pid 1 and 2 both are fully contained in pid1 ctext so I need a result set like 

    1      2

    1      3

    Also 1   4 wont be valid as pid 1 doesnt contain L and M (partial). Thanks !

     


    • Editado Albeit sexta-feira, 31 de agosto de 2012 19:43
    •  

Todas as Respostas

  • sexta-feira, 31 de agosto de 2012 03:22
     
      Contém Código
    Declare @MyTable Table(ID int identity(1,1), PID int, Ctext varchar(max))
    Insert into @MyTable
    Select 1, 'A' Union All
    Select 1, 'B' Union All
    Select 1, 'C' Union All
    Select 1, 'D' Union All
    Select 1, 'E' Union All
    Select 1, 'F' Union All
    Select 1, 'G' Union All
    Select 1, 'I' Union All
    Select 1, 'J' Union All
    Select 1, 'K' Union All
    Select 2, 'A' Union All
    Select 2, 'B' Union All
    Select 2, 'C' Union All
    Select 2, 'D' Union All
    Select 2, 'E' Union All
    Select 2, 'F' Union All
    Select 3, 'A' Union All
    Select 3, 'B' Union All
    Select 3, 'C' Union All
    Select 3, 'D' Union All
    Select 3, 'E' Union All
    Select 4, 'J' Union All
    Select 4, 'K' Union All
    Select 4, 'L' Union All
    Select 4, 'M';
    --First of all, you need to count the records based on the PID group to see how many occurances in each different PID group
    --As you will use the group with the highest occurances as the parent
    ;With CTE1
    As
    (
    Select	ID, PID, CTExt, Count(PID) Over(Partition By PID) As MyCount
    from	@MyTable
    )
    --Secondly, you need to find out which one is the parent group based on the records count per group
    --if there are same records count for different group, then you need to break tie with a PID
    , CTE2
    As
    (
    Select	ID, PID, CTExt, Dense_Rank() Over(Order by MyCount Desc, PID) As MaxText
    From	CTE1
    )
    --Last you use a right outer join from the parent to all the subset groups
    --if there is anything not in the parent group which is in the subset group, 
    --then the NULL PID will be shown for the parent, for those subset groups is not you need
    --I use a distinct count to find out the count of the parent by giving the NULL PID an -1 value
    --you filter out the count of distinct parent PID = 1
    Select	Max(A.PID) As PID_Parent, B.PID As PID_Subset
    From
    		(
    			Select	*
    			From CTE2
    			Where	MaxText = 1
    		) A		
    			Right Outer Join
    		(
    			Select	*
    			From	CTE2
    			Where	MaxText != 1
    		) B
    			On A.Ctext  = B.Ctext
    Group By B.PID
    Having Count(Distinct IsNull(A.PID, -1)) = 1;


    View Steven Wang's profile on LinkedIn | Blog: MSBICOE.com | MCITP - BI, SQL Developer & DBA

    Hate to mislead others, if I'm wrong slap me. Thanks!

  • sexta-feira, 31 de agosto de 2012 03:23
     
      Contém Código

    Try this,

    SELECT  Parent.pid, Child.pid
    FROM    ( SELECT    pid, STUFF(( SELECT ',' + ctext
                                     FROM   dbo.temp_test X
                                     WHERE  X.pid = Y.pid
                                     ORDER BY ctext 
                                   FOR
                                     XML PATH('')
                                   ), 1, 1, '') AS Sequ
              FROM      temp_test Y
              WHERE     Y.pid = 1
            ) AS Parent
            CROSS JOIN ( SELECT pid, STUFF(( SELECT ',' + ctext
                                             FROM   dbo.temp_test X1
                                             WHERE  X1.pid = Y1.pid
                                             ORDER BY ctext 
                                           FOR
                                             XML PATH('')
                                           ), 1, 1, '') AS Sequ
                         FROM   temp_test Y1
                         WHERE  Y1.pid <> 1
                         GROUP BY pid
                       ) AS Child
    WHERE   Parent.pid <> Child.pid
            AND DIFFERENCE(Parent.Sequ, Child.Sequ) = 4
    GROUP BY Parent.pid, Child.pid, Parent.Sequ, Child.Sequ


    Thanks, Sachin Surve

  • sexta-feira, 31 de agosto de 2012 04:32
     
     Respondido
    This is called a Relational Division. It is one of Dr. Codd's original eight operators. Since you did not borher to post any DDL, I do not feel the need to post DML. Google it. 

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    • Marcado como Resposta Albeit sexta-feira, 31 de agosto de 2012 17:16
    •  
  • sexta-feira, 31 de agosto de 2012 05:26
     
      Contém Código

    If you want to find all of them, then use a generic one as below:

    Declare @MyTable Table(ID int identity(1,1), PID int, Ctext varchar(max))
    Insert into @MyTable
    Select 1, 'A' Union All
    Select 1, 'B' Union All
    Select 1, 'C' Union All
    Select 1, 'D' Union All
    Select 1, 'E' Union All
    Select 1, 'F' Union All
    Select 1, 'G' Union All
    Select 1, 'I' Union All
    Select 1, 'J' Union All
    Select 1, 'K' Union All
    Select 2, 'A' Union All
    Select 2, 'B' Union All
    Select 2, 'C' Union All
    Select 2, 'D' Union All
    Select 2, 'E' Union All
    Select 2, 'F' Union All
    Select 3, 'A' Union All
    Select 3, 'B' Union All
    Select 3, 'C' Union All
    Select 3, 'D' Union All
    Select 3, 'E' Union All
    Select 4, 'J' Union All
    Select 4, 'K' Union All
    Select 4, 'L' Union All
    Select 4, 'M';
    Select	X.*
    From	 
    		(
    			Select	PID
    			From	@MyTable 
    			Group By PID
    		) P
    			Cross Apply
    		(
    			Select	Max(A.PID) As PID_Parent, B.PID As PID_Subset
    			From
    					(
    						Select	*
    						From  @MyTable
    						Where	PID = P.PID
    					) A		
    						Right Outer Join
    					(
    						Select	*
    						From	 @MyTable
    						Where	PID != P.PID
    					) B
    						On A.Ctext  = B.Ctext
    			Group By B.PID
    			Having Count(Distinct IsNull(A.PID, -1)) = 1
    		) X
    Where	X.PID_Parent Is Not NULL;


    View Steven Wang's profile on LinkedIn | Blog: MSBICOE.com | MCITP - BI, SQL Developer & DBA

    Hate to mislead others, if I'm wrong slap me. Thanks!

  • sexta-feira, 31 de agosto de 2012 09:22
     
     Respondido

    Perhaps I'm a little simple minded, but I would just opt for a simple join leveraged by a NOT EXISTS criteria:

    --===========================================================
    SELECT
      A.pid AS superset ,
      B.pid AS subset
    FROM test_table AS A
      INNER JOIN test_table AS B
        ON A.ctext = B.ctext AND A.pid != B.pid
    WHERE
      NOT EXISTS ( SELECT ctext FROM test_table
                   WHERE
                     pid = B.pid
                     AND
                     ctext NOT IN ( SELECT ctext FROM test_table
                                     WHERE pid = A.pid ) )
    GROUP BY A.pid , B.pid
    ORDER BY A.pid , B.pid
    --===========================================================

    Using your example data, this produces a single-row result of:

    superset    subset
    ----------- -----------
    1           2

    which is correct, because the Z in pid 3 is not included in pid 1's ctext values.

    If anyone sees an error in my logic, please flog me with it. In the meantime, I'm off to google "relational division".

    • Sugerido como Resposta wclardy sexta-feira, 31 de agosto de 2012 16:12
    • Marcado como Resposta Albeit sábado, 1 de setembro de 2012 01:32
    •  
  • sexta-feira, 31 de agosto de 2012 17:19
     
     
    Thanks a bunch CELKO :) I saw your another post even before reading reply here and got the answer, the solution is working now ! I just need an extra temp table.


    • Editado Albeit sexta-feira, 31 de agosto de 2012 17:32
    •  
  • sábado, 1 de setembro de 2012 14:51
     
     

    Albeit, if you're using my answer, you don't need an additional table.

    The join is actually a self-join, using A and B as aliases for a single table.

  • sábado, 1 de setembro de 2012 16:23
     
     

    wclardy...can you please explain what is logic in NOT EXISTS(....) and if you are using GROUP By to remove duplicate rows ?

    Thanks !!


    • Editado Albeit sábado, 1 de setembro de 2012 16:24
    •