subset or hierarchy from same column
-
quinta-feira, 30 de agosto de 2012 21:15
id pid ctext 1 1 A 2 1 B 3 1 C 4 1 D 5 1 E 6 1 F 7 1 G 8 1 I 9 1 J 10 1 K 11 2 A 12 2 B 13 2 C 14 2 D 15 2 E 16 2 F 17 3 A 18 3 B 19 3 C 20 3 D 21 3 Z 22 4 J 23 4 K 24 4 L 25 4 M Hi, I have the above table and I need to find if a pid fully not partially contains ctext of any other pid. For example. pid 1 has ctext from A until K, ctext from pid 1 and 2 both are fully contained in pid1 ctext so I need a result set like
1 2
1 3
Also 1 4 wont be valid as pid 1 doesnt contain L and M (partial). Thanks !
- Editado Albeit sexta-feira, 31 de agosto de 2012 19:43
Todas as Respostas
-
sexta-feira, 31 de agosto de 2012 03:22
Declare @MyTable Table(ID int identity(1,1), PID int, Ctext varchar(max)) Insert into @MyTable Select 1, 'A' Union All Select 1, 'B' Union All Select 1, 'C' Union All Select 1, 'D' Union All Select 1, 'E' Union All Select 1, 'F' Union All Select 1, 'G' Union All Select 1, 'I' Union All Select 1, 'J' Union All Select 1, 'K' Union All Select 2, 'A' Union All Select 2, 'B' Union All Select 2, 'C' Union All Select 2, 'D' Union All Select 2, 'E' Union All Select 2, 'F' Union All Select 3, 'A' Union All Select 3, 'B' Union All Select 3, 'C' Union All Select 3, 'D' Union All Select 3, 'E' Union All Select 4, 'J' Union All Select 4, 'K' Union All Select 4, 'L' Union All Select 4, 'M'; --First of all, you need to count the records based on the PID group to see how many occurances in each different PID group --As you will use the group with the highest occurances as the parent ;With CTE1 As ( Select ID, PID, CTExt, Count(PID) Over(Partition By PID) As MyCount from @MyTable ) --Secondly, you need to find out which one is the parent group based on the records count per group --if there are same records count for different group, then you need to break tie with a PID , CTE2 As ( Select ID, PID, CTExt, Dense_Rank() Over(Order by MyCount Desc, PID) As MaxText From CTE1 ) --Last you use a right outer join from the parent to all the subset groups --if there is anything not in the parent group which is in the subset group, --then the NULL PID will be shown for the parent, for those subset groups is not you need --I use a distinct count to find out the count of the parent by giving the NULL PID an -1 value --you filter out the count of distinct parent PID = 1 Select Max(A.PID) As PID_Parent, B.PID As PID_Subset From ( Select * From CTE2 Where MaxText = 1 ) A Right Outer Join ( Select * From CTE2 Where MaxText != 1 ) B On A.Ctext = B.Ctext Group By B.PID Having Count(Distinct IsNull(A.PID, -1)) = 1;
|
Blog: MSBICOE.com |
MCITP - BI, SQL Developer & DBA
Hate to mislead others, if I'm wrong slap me. Thanks!
-
sexta-feira, 31 de agosto de 2012 03:23
Try this,
SELECT Parent.pid, Child.pid FROM ( SELECT pid, STUFF(( SELECT ',' + ctext FROM dbo.temp_test X WHERE X.pid = Y.pid ORDER BY ctext FOR XML PATH('') ), 1, 1, '') AS Sequ FROM temp_test Y WHERE Y.pid = 1 ) AS Parent CROSS JOIN ( SELECT pid, STUFF(( SELECT ',' + ctext FROM dbo.temp_test X1 WHERE X1.pid = Y1.pid ORDER BY ctext FOR XML PATH('') ), 1, 1, '') AS Sequ FROM temp_test Y1 WHERE Y1.pid <> 1 GROUP BY pid ) AS Child WHERE Parent.pid <> Child.pid AND DIFFERENCE(Parent.Sequ, Child.Sequ) = 4 GROUP BY Parent.pid, Child.pid, Parent.Sequ, Child.SequThanks, Sachin Surve
-
sexta-feira, 31 de agosto de 2012 04:32
This is called a Relational Division. It is one of Dr. Codd's original eight operators. Since you did not borher to post any DDL, I do not feel the need to post DML. Google it.--CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL
- Marcado como Resposta Albeit sexta-feira, 31 de agosto de 2012 17:16
-
sexta-feira, 31 de agosto de 2012 05:26
If you want to find all of them, then use a generic one as below:
Declare @MyTable Table(ID int identity(1,1), PID int, Ctext varchar(max)) Insert into @MyTable Select 1, 'A' Union All Select 1, 'B' Union All Select 1, 'C' Union All Select 1, 'D' Union All Select 1, 'E' Union All Select 1, 'F' Union All Select 1, 'G' Union All Select 1, 'I' Union All Select 1, 'J' Union All Select 1, 'K' Union All Select 2, 'A' Union All Select 2, 'B' Union All Select 2, 'C' Union All Select 2, 'D' Union All Select 2, 'E' Union All Select 2, 'F' Union All Select 3, 'A' Union All Select 3, 'B' Union All Select 3, 'C' Union All Select 3, 'D' Union All Select 3, 'E' Union All Select 4, 'J' Union All Select 4, 'K' Union All Select 4, 'L' Union All Select 4, 'M'; Select X.* From ( Select PID From @MyTable Group By PID ) P Cross Apply ( Select Max(A.PID) As PID_Parent, B.PID As PID_Subset From ( Select * From @MyTable Where PID = P.PID ) A Right Outer Join ( Select * From @MyTable Where PID != P.PID ) B On A.Ctext = B.Ctext Group By B.PID Having Count(Distinct IsNull(A.PID, -1)) = 1 ) X Where X.PID_Parent Is Not NULL;
|
Blog: MSBICOE.com |
MCITP - BI, SQL Developer & DBA
Hate to mislead others, if I'm wrong slap me. Thanks!
-
sexta-feira, 31 de agosto de 2012 09:22
Perhaps I'm a little simple minded, but I would just opt for a simple join leveraged by a NOT EXISTS criteria:
--===========================================================
SELECT
A.pid AS superset ,
B.pid AS subset
FROM test_table AS A
INNER JOIN test_table AS B
ON A.ctext = B.ctext AND A.pid != B.pid
WHERE
NOT EXISTS ( SELECT ctext FROM test_table
WHERE
pid = B.pid
AND
ctext NOT IN ( SELECT ctext FROM test_table
WHERE pid = A.pid ) )
GROUP BY A.pid , B.pid
ORDER BY A.pid , B.pid
--===========================================================Using your example data, this produces a single-row result of:
superset subset
----------- -----------
1 2which is correct, because the Z in pid 3 is not included in pid 1's ctext values.
If anyone sees an error in my logic, please flog me with it. In the meantime, I'm off to google "relational division".
-
sexta-feira, 31 de agosto de 2012 17:19Thanks a bunch CELKO :) I saw your another post even before reading reply here and got the answer, the solution is working now ! I just need an extra temp table.
- Editado Albeit sexta-feira, 31 de agosto de 2012 17:32
-
sábado, 1 de setembro de 2012 14:51
Albeit, if you're using my answer, you don't need an additional table.
The join is actually a self-join, using A and B as aliases for a single table.
-
sábado, 1 de setembro de 2012 16:23
wclardy...can you please explain what is logic in NOT EXISTS(....) and if you are using GROUP By to remove duplicate rows ?
Thanks !!
- Editado Albeit sábado, 1 de setembro de 2012 16:24

