none
Select parent ids that have exact match on child values (no more no less)

    Pregunta

  • I am having difficulty getting the correct results using TSQL

    I have a parent table, primary key -> Parent.Id and a child table related by Child.Parent_Id

    The child table contains keywords.

    I want to get only the parents back that have an exact match on the keywords.

    I tried the query...

    select Parent_Id from child where child.keyword in ('one' 'two', 'three') group by parent_id having count(distinct keyword = 3)

    This does not work if there are more than three .. For example a parent will be returned if it has ('one' 'two', 'three', 'four')

    What can I do?

     

     

     

     

    martes, 29 de junio de 2010 17:03

Respuestas

  • USE tempdb;
    GO
    DECLARE @P TABLE (PId int NOT NULL UNIQUE);
    DECLARE @C TABLE (PId int NOT NULL, keyword varchar(25) NOT NULL, UNIQUE (PId, keyword));
    
    INSERT INTO @P VALUES(1);
    INSERT INTO @P VALUES(2);
    INSERT INTO @P VALUES(3);
    
    INSERT INTO @C VALUES(1, 'one');
    INSERT INTO @C VALUES(1, 'two');
    INSERT INTO @C VALUES(1, 'three');
    INSERT INTO @C VALUES(1, 'four');
    
    INSERT INTO @C VALUES(2, 'one');
    INSERT INTO @C VALUES(2, 'two');
    INSERT INTO @C VALUES(2, 'three');
    
    INSERT INTO @C VALUES(3, 'one');
    INSERT INTO @C VALUES(3, 'two');
    
    SELECT
     P.PId 
    FROM
     @P AS P
     INNER JOIN
     @C AS C
     ON C.PId = P.PId
    WHERE
     C.keyword in ('one', 'two', 'three') 
    GROUP BY
     P.PId
    HAVING
     COUNT(DISTINCT C.keyword) = 3
     AND COUNT(DISTINCT C.keyword) = (SELECT COUNT(*) FROM @C AS C2 WHERE C2.PId = P.PId);
    
    WITH AggStr AS (
    SELECT
    	PId,
    	(SELECT CAST(',' AS varchar(MAX)) + C.keyword FROM @C AS C WHERE C.PId = P.PId ORDER BY C.keyword FOR XML PATH('')) AS c1
    FROM
    	@P AS P
    )
    SELECT
    	PId
    FROM
    	AggStr
    WHERE
     c1 = ',one,three,two';
    
    
    select PId from @c Child 
    where child.keyword in ('one', 'two', 'three') 
    and not exists 
    (select 1 from @c C1 where C1.PID = Child.PID
    
    and C1.Keyword not in ('one','two','three')) 
    
    group by pid having count(distinct keyword) = 3 
    
    
    
    Looks like my original query performs as good as aggregate query.


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    martes, 29 de junio de 2010 17:24
    Moderador
  • Try:

    USE tempdb;
    GO
    DECLARE @P TABLE (PId int NOT NULL UNIQUE);
    DECLARE @C TABLE (PId int NOT NULL, keyword varchar(25) NOT NULL, UNIQUE (PId, keyword));
    
    INSERT INTO @P VALUES(1);
    INSERT INTO @P VALUES(2);
    INSERT INTO @P VALUES(3);
    
    INSERT INTO @C VALUES(1, 'one');
    INSERT INTO @C VALUES(1, 'two');
    INSERT INTO @C VALUES(1, 'three');
    INSERT INTO @C VALUES(1, 'four');
    
    INSERT INTO @C VALUES(2, 'one');
    INSERT INTO @C VALUES(2, 'two');
    INSERT INTO @C VALUES(2, 'three');
    
    INSERT INTO @C VALUES(3, 'one');
    INSERT INTO @C VALUES(3, 'two');
    
    SELECT
      P.PId 
    FROM
      @P AS P
      INNER JOIN
      @C AS C
      ON C.PId = P.PId
    WHERE
      C.keyword in ('one', 'two', 'three') 
    GROUP BY
      P.PId
    HAVING
      COUNT(DISTINCT C.keyword) = 3
      AND COUNT(DISTINCT C.keyword) = (SELECT COUNT(*) FROM @C AS C2 WHERE C2.PId = P.PId);
    
    WITH AggStr AS (
    SELECT
    	PId,
    	(SELECT CAST(',' AS varchar(MAX)) + C.keyword FROM @C AS C WHERE C.PId = P.PId ORDER BY C.keyword FOR XML PATH('')) AS c1
    FROM
    	@P AS P
    )
    SELECT
    	PId
    FROM
    	AggStr
    WHERE
      c1 = ',one,three,two';
    GO

    Sometime using aggregate string could be handy.

    Keep in mind that I am not using a foreign key constraint in the example to enforce [PId] domain in [@C] table.

    Divided We Stand: The SQL of Relational Division

     

    AMB

    • Marcado como respuesta Bqsoftware1 martes, 29 de junio de 2010 18:28
    martes, 29 de junio de 2010 17:37
    Moderador

Todas las respuestas

  • Take a look at this blog describing very simple (a bit more complex) problem

    How to search for all words inclusive without using Full Text search


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    martes, 29 de junio de 2010 17:20
    Moderador
  • USE tempdb;
    GO
    DECLARE @P TABLE (PId int NOT NULL UNIQUE);
    DECLARE @C TABLE (PId int NOT NULL, keyword varchar(25) NOT NULL, UNIQUE (PId, keyword));
    
    INSERT INTO @P VALUES(1);
    INSERT INTO @P VALUES(2);
    INSERT INTO @P VALUES(3);
    
    INSERT INTO @C VALUES(1, 'one');
    INSERT INTO @C VALUES(1, 'two');
    INSERT INTO @C VALUES(1, 'three');
    INSERT INTO @C VALUES(1, 'four');
    
    INSERT INTO @C VALUES(2, 'one');
    INSERT INTO @C VALUES(2, 'two');
    INSERT INTO @C VALUES(2, 'three');
    
    INSERT INTO @C VALUES(3, 'one');
    INSERT INTO @C VALUES(3, 'two');
    
    SELECT
     P.PId 
    FROM
     @P AS P
     INNER JOIN
     @C AS C
     ON C.PId = P.PId
    WHERE
     C.keyword in ('one', 'two', 'three') 
    GROUP BY
     P.PId
    HAVING
     COUNT(DISTINCT C.keyword) = 3
     AND COUNT(DISTINCT C.keyword) = (SELECT COUNT(*) FROM @C AS C2 WHERE C2.PId = P.PId);
    
    WITH AggStr AS (
    SELECT
    	PId,
    	(SELECT CAST(',' AS varchar(MAX)) + C.keyword FROM @C AS C WHERE C.PId = P.PId ORDER BY C.keyword FOR XML PATH('')) AS c1
    FROM
    	@P AS P
    )
    SELECT
    	PId
    FROM
    	AggStr
    WHERE
     c1 = ',one,three,two';
    
    
    select PId from @c Child 
    where child.keyword in ('one', 'two', 'three') 
    and not exists 
    (select 1 from @c C1 where C1.PID = Child.PID
    
    and C1.Keyword not in ('one','two','three')) 
    
    group by pid having count(distinct keyword) = 3 
    
    
    
    Looks like my original query performs as good as aggregate query.


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    martes, 29 de junio de 2010 17:24
    Moderador
  • Try:

    USE tempdb;
    GO
    DECLARE @P TABLE (PId int NOT NULL UNIQUE);
    DECLARE @C TABLE (PId int NOT NULL, keyword varchar(25) NOT NULL, UNIQUE (PId, keyword));
    
    INSERT INTO @P VALUES(1);
    INSERT INTO @P VALUES(2);
    INSERT INTO @P VALUES(3);
    
    INSERT INTO @C VALUES(1, 'one');
    INSERT INTO @C VALUES(1, 'two');
    INSERT INTO @C VALUES(1, 'three');
    INSERT INTO @C VALUES(1, 'four');
    
    INSERT INTO @C VALUES(2, 'one');
    INSERT INTO @C VALUES(2, 'two');
    INSERT INTO @C VALUES(2, 'three');
    
    INSERT INTO @C VALUES(3, 'one');
    INSERT INTO @C VALUES(3, 'two');
    
    SELECT
      P.PId 
    FROM
      @P AS P
      INNER JOIN
      @C AS C
      ON C.PId = P.PId
    WHERE
      C.keyword in ('one', 'two', 'three') 
    GROUP BY
      P.PId
    HAVING
      COUNT(DISTINCT C.keyword) = 3
      AND COUNT(DISTINCT C.keyword) = (SELECT COUNT(*) FROM @C AS C2 WHERE C2.PId = P.PId);
    
    WITH AggStr AS (
    SELECT
    	PId,
    	(SELECT CAST(',' AS varchar(MAX)) + C.keyword FROM @C AS C WHERE C.PId = P.PId ORDER BY C.keyword FOR XML PATH('')) AS c1
    FROM
    	@P AS P
    )
    SELECT
    	PId
    FROM
    	AggStr
    WHERE
      c1 = ',one,three,two';
    GO

    Sometime using aggregate string could be handy.

    Keep in mind that I am not using a foreign key constraint in the example to enforce [PId] domain in [@C] table.

    Divided We Stand: The SQL of Relational Division

     

    AMB

    • Marcado como respuesta Bqsoftware1 martes, 29 de junio de 2010 18:28
    martes, 29 de junio de 2010 17:37
    Moderador
  • Hi Hunchback
    Thanks this link !!!
    I like "Joe celko"
    http://www.simple-talk.com/author/joe-celko/

    Hehehe I have arranged "Relational Division"
    Arrange point is to derive record count using OLAP Count before inner Join.

    with p(PId) as(
    select 1 union
    select 2 union
    select 3),
    c(PId,keyword) as(
    select 1,'one'  union
    select 1,'two'  union
    select 1,'three' union
    select 1,'four' union
    select 2,'one'  union
    select 2,'two'  union
    select 2,'three' union
    select 3,'one'  union
    select 3,'two')
    select p.PId
     from p,(select PId,keyword,
         count(*) over(partition by PId) as cnt
         from c) c
     where p.PId = c.PId
      and c.keyword in('one','two','three')
    group by p.PId,c.cnt
    having count(*) = 3
      and count(*) = c.cnt;
    
    PId
    ---
     2

    My SQLServer OLAP samples
    http://oraclesqlpuzzle.hp.infoseek.co.jp/sqlserver2008-sql1-olap.html

     

     

    • Propuesto como respuesta Aketi Jyuuzou miércoles, 30 de junio de 2010 11:59
    miércoles, 30 de junio de 2010 11:56
  • I like to keep things simple.

    SELECT   Parent_ID
    FROM  	  Child
    GROUP BY  Parent_ID
    HAVING   MIN(CASE WHEN Keyword IN ('one', 'two', 'three') THEN 1 ELSE 0 END) = 1   -- No extra record outside the "scope"
          AND SUM(CASE WHEN Keyword IN ('one', 'two', 'three') THEN 1 ELSE 0 END) = 3 -- All records should match
    • Propuesto como respuesta Naomi NModerator miércoles, 30 de junio de 2010 15:25
    miércoles, 30 de junio de 2010 12:24
  • Peso,

    That is a neat solution.

    AMB

    miércoles, 30 de junio de 2010 13:09
    Moderador
  • Thank you!

    miércoles, 30 de junio de 2010 13:19
  • Very neat solution! Would be nice to get performance results for all solutions suggested.
    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    miércoles, 30 de junio de 2010 15:27
    Moderador
  • You can see all execution plans here.


    miércoles, 30 de junio de 2010 18:33
  • Thanks, looks like yours is a winner. BTW, There are no Naomi 1 & 2 - they are both Alejandro's. I just tried to test his 2 and mine together. In any case, few records sample is not that helpful in performance tests.

     


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    miércoles, 30 de junio de 2010 18:44
    Moderador
  • True, but the execution plans (and STATISTICS IO) do tell a lot of the predicted behaviour.
    I just blogged about this question here http://weblogs.sqlteam.com/peterl/archive/2010/06/30/Relational-division.aspx.

    And as I wrote in the blog, the second aggregation filter is just for make sure there are at least three matches.
    So, what to do with the cases where there are more keywords, as there are for ParentID 1?

    The solution (at least for me) is to think about the fractional part in the division. Just a any other division, there cannot be a fractional part (other than zero) if the division is valid. So the first aggregation filter does that, it makes sure there is not "fractional" part of the relational division.

    //Peso

    • Propuesto como respuesta Naomi NModerator miércoles, 30 de junio de 2010 20:07
    miércoles, 30 de junio de 2010 19:48
  • In your blog post I don't see XML solution. You may want to add it for completeness and the link to Celko blog.
    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    miércoles, 30 de junio de 2010 20:08
    Moderador
  • Good point, and corrected. Thank you.
    I also took the opportunity to read this link http://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/
    and I must say I think my solution is the simplest of them all.

    miércoles, 30 de junio de 2010 20:39