Note: Forums will be making significant UX changes to address key usability improvements surrounding search, discoverability and navigation. To learn more about these changes please visit the announcement which can be found HERE.
Select parent ids that have exact match on child values (no more no less)

Answered Select parent ids that have exact match on child values (no more no less)

  • Tuesday, June 29, 2010 5:03 PM
     
      Has Code

    I am having difficulty getting the correct results using TSQL

    I have a parent table, primary key -> Parent.Id and a child table related by Child.Parent_Id

    The child table contains keywords.

    I want to get only the parents back that have an exact match on the keywords.

    I tried the query...

    select Parent_Id from child where child.keyword in ('one' 'two', 'three') group by parent_id having count(distinct keyword = 3)

    This does not work if there are more than three .. For example a parent will be returned if it has ('one' 'two', 'three', 'four')

    What can I do?

     

     

     

     

All Replies

  • Tuesday, June 29, 2010 5:20 PM
    Moderator
     
     

    Take a look at this blog describing very simple (a bit more complex) problem

    How to search for all words inclusive without using Full Text search


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
  • Tuesday, June 29, 2010 5:24 PM
    Moderator
     
     Answered Has Code

    USE tempdb;
    GO
    DECLARE @P TABLE (PId int NOT NULL UNIQUE);
    DECLARE @C TABLE (PId int NOT NULL, keyword varchar(25) NOT NULL, UNIQUE (PId, keyword));
    
    INSERT INTO @P VALUES(1);
    INSERT INTO @P VALUES(2);
    INSERT INTO @P VALUES(3);
    
    INSERT INTO @C VALUES(1, 'one');
    INSERT INTO @C VALUES(1, 'two');
    INSERT INTO @C VALUES(1, 'three');
    INSERT INTO @C VALUES(1, 'four');
    
    INSERT INTO @C VALUES(2, 'one');
    INSERT INTO @C VALUES(2, 'two');
    INSERT INTO @C VALUES(2, 'three');
    
    INSERT INTO @C VALUES(3, 'one');
    INSERT INTO @C VALUES(3, 'two');
    
    SELECT
     P.PId 
    FROM
     @P AS P
     INNER JOIN
     @C AS C
     ON C.PId = P.PId
    WHERE
     C.keyword in ('one', 'two', 'three') 
    GROUP BY
     P.PId
    HAVING
     COUNT(DISTINCT C.keyword) = 3
     AND COUNT(DISTINCT C.keyword) = (SELECT COUNT(*) FROM @C AS C2 WHERE C2.PId = P.PId);
    
    WITH AggStr AS (
    SELECT
    	PId,
    	(SELECT CAST(',' AS varchar(MAX)) + C.keyword FROM @C AS C WHERE C.PId = P.PId ORDER BY C.keyword FOR XML PATH('')) AS c1
    FROM
    	@P AS P
    )
    SELECT
    	PId
    FROM
    	AggStr
    WHERE
     c1 = ',one,three,two';
    
    
    select PId from @c Child 
    where child.keyword in ('one', 'two', 'three') 
    and not exists 
    (select 1 from @c C1 where C1.PID = Child.PID
    
    and C1.Keyword not in ('one','two','three')) 
    
    group by pid having count(distinct keyword) = 3 
    
    
    
    Looks like my original query performs as good as aggregate query.


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
  • Tuesday, June 29, 2010 5:37 PM
    Moderator
     
     Answered Has Code

    Try:

    USE tempdb;
    GO
    DECLARE @P TABLE (PId int NOT NULL UNIQUE);
    DECLARE @C TABLE (PId int NOT NULL, keyword varchar(25) NOT NULL, UNIQUE (PId, keyword));
    
    INSERT INTO @P VALUES(1);
    INSERT INTO @P VALUES(2);
    INSERT INTO @P VALUES(3);
    
    INSERT INTO @C VALUES(1, 'one');
    INSERT INTO @C VALUES(1, 'two');
    INSERT INTO @C VALUES(1, 'three');
    INSERT INTO @C VALUES(1, 'four');
    
    INSERT INTO @C VALUES(2, 'one');
    INSERT INTO @C VALUES(2, 'two');
    INSERT INTO @C VALUES(2, 'three');
    
    INSERT INTO @C VALUES(3, 'one');
    INSERT INTO @C VALUES(3, 'two');
    
    SELECT
      P.PId 
    FROM
      @P AS P
      INNER JOIN
      @C AS C
      ON C.PId = P.PId
    WHERE
      C.keyword in ('one', 'two', 'three') 
    GROUP BY
      P.PId
    HAVING
      COUNT(DISTINCT C.keyword) = 3
      AND COUNT(DISTINCT C.keyword) = (SELECT COUNT(*) FROM @C AS C2 WHERE C2.PId = P.PId);
    
    WITH AggStr AS (
    SELECT
    	PId,
    	(SELECT CAST(',' AS varchar(MAX)) + C.keyword FROM @C AS C WHERE C.PId = P.PId ORDER BY C.keyword FOR XML PATH('')) AS c1
    FROM
    	@P AS P
    )
    SELECT
    	PId
    FROM
    	AggStr
    WHERE
      c1 = ',one,three,two';
    GO

    Sometime using aggregate string could be handy.

    Keep in mind that I am not using a foreign key constraint in the example to enforce [PId] domain in [@C] table.

    Divided We Stand: The SQL of Relational Division

     

    AMB

    • Marked As Answer by Bqsoftware1 Tuesday, June 29, 2010 6:28 PM
    •  
  • Wednesday, June 30, 2010 11:56 AM
     
     Proposed Has Code

    Hi Hunchback
    Thanks this link !!!
    I like "Joe celko"
    http://www.simple-talk.com/author/joe-celko/

    Hehehe I have arranged "Relational Division"
    Arrange point is to derive record count using OLAP Count before inner Join.

    with p(PId) as(
    select 1 union
    select 2 union
    select 3),
    c(PId,keyword) as(
    select 1,'one'  union
    select 1,'two'  union
    select 1,'three' union
    select 1,'four' union
    select 2,'one'  union
    select 2,'two'  union
    select 2,'three' union
    select 3,'one'  union
    select 3,'two')
    select p.PId
     from p,(select PId,keyword,
         count(*) over(partition by PId) as cnt
         from c) c
     where p.PId = c.PId
      and c.keyword in('one','two','three')
    group by p.PId,c.cnt
    having count(*) = 3
      and count(*) = c.cnt;
    
    PId
    ---
     2

    My SQLServer OLAP samples
    http://oraclesqlpuzzle.hp.infoseek.co.jp/sqlserver2008-sql1-olap.html

     

     

    • Proposed As Answer by Aketi Jyuuzou Wednesday, June 30, 2010 11:59 AM
    •  
  • Wednesday, June 30, 2010 12:24 PM
     
     Proposed Has Code
    I like to keep things simple.

    SELECT   Parent_ID
    FROM  	  Child
    GROUP BY  Parent_ID
    HAVING   MIN(CASE WHEN Keyword IN ('one', 'two', 'three') THEN 1 ELSE 0 END) = 1   -- No extra record outside the "scope"
          AND SUM(CASE WHEN Keyword IN ('one', 'two', 'three') THEN 1 ELSE 0 END) = 3 -- All records should match
  • Wednesday, June 30, 2010 1:09 PM
    Moderator
     
     

    Peso,

    That is a neat solution.

    AMB

  • Wednesday, June 30, 2010 1:19 PM
     
     
    Thank you!

  • Wednesday, June 30, 2010 3:27 PM
    Moderator
     
     
    Very neat solution! Would be nice to get performance results for all solutions suggested.
    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
  • Wednesday, June 30, 2010 6:33 PM
     
     
    You can see all execution plans here.


  • Wednesday, June 30, 2010 6:44 PM
    Moderator
     
     

    Thanks, looks like yours is a winner. BTW, There are no Naomi 1 & 2 - they are both Alejandro's. I just tried to test his 2 and mine together. In any case, few records sample is not that helpful in performance tests.

     


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
  • Wednesday, June 30, 2010 7:48 PM
     
     Proposed

    True, but the execution plans (and STATISTICS IO) do tell a lot of the predicted behaviour.
    I just blogged about this question here http://weblogs.sqlteam.com/peterl/archive/2010/06/30/Relational-division.aspx.

    And as I wrote in the blog, the second aggregation filter is just for make sure there are at least three matches.
    So, what to do with the cases where there are more keywords, as there are for ParentID 1?

    The solution (at least for me) is to think about the fractional part in the division. Just a any other division, there cannot be a fractional part (other than zero) if the division is valid. So the first aggregation filter does that, it makes sure there is not "fractional" part of the relational division.

    //Peso

  • Wednesday, June 30, 2010 8:08 PM
    Moderator
     
     
    In your blog post I don't see XML solution. You may want to add it for completeness and the link to Celko blog.
    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
  • Wednesday, June 30, 2010 8:39 PM
     
     

    Good point, and corrected. Thank you.
    I also took the opportunity to read this link http://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/
    and I must say I think my solution is the simplest of them all.