Select parent ids that have exact match on child values (no more no less)
-
Tuesday, June 29, 2010 5:03 PM
I am having difficulty getting the correct results using TSQL
I have a parent table, primary key -> Parent.Id and a child table related by Child.Parent_Id
The child table contains keywords.
I want to get only the parents back that have an exact match on the keywords.
I tried the query...
select Parent_Id from child where child.keyword in ('one' 'two', 'three') group by parent_id having count(distinct keyword = 3)
This does not work if there are more than three .. For example a parent will be returned if it has ('one' 'two', 'three', 'four')
What can I do?
- Changed Type HunchbackMVP, Moderator Tuesday, June 29, 2010 6:00 PM
All Replies
-
Tuesday, June 29, 2010 5:20 PMModerator
Take a look at this blog describing very simple (a bit more complex) problem
How to search for all words inclusive without using Full Text search
Premature optimization is the root of all evil in programming. (c) by Donald Knuth
Naomi Nosonovsky, Sr. Programmer-Analyst
My blog -
Tuesday, June 29, 2010 5:24 PMModerator
USE tempdb; GO DECLARE @P TABLE (PId int NOT NULL UNIQUE); DECLARE @C TABLE (PId int NOT NULL, keyword varchar(25) NOT NULL, UNIQUE (PId, keyword)); INSERT INTO @P VALUES(1); INSERT INTO @P VALUES(2); INSERT INTO @P VALUES(3); INSERT INTO @C VALUES(1, 'one'); INSERT INTO @C VALUES(1, 'two'); INSERT INTO @C VALUES(1, 'three'); INSERT INTO @C VALUES(1, 'four'); INSERT INTO @C VALUES(2, 'one'); INSERT INTO @C VALUES(2, 'two'); INSERT INTO @C VALUES(2, 'three'); INSERT INTO @C VALUES(3, 'one'); INSERT INTO @C VALUES(3, 'two'); SELECT P.PId FROM @P AS P INNER JOIN @C AS C ON C.PId = P.PId WHERE C.keyword in ('one', 'two', 'three') GROUP BY P.PId HAVING COUNT(DISTINCT C.keyword) = 3 AND COUNT(DISTINCT C.keyword) = (SELECT COUNT(*) FROM @C AS C2 WHERE C2.PId = P.PId); WITH AggStr AS ( SELECT PId, (SELECT CAST(',' AS varchar(MAX)) + C.keyword FROM @C AS C WHERE C.PId = P.PId ORDER BY C.keyword FOR XML PATH('')) AS c1 FROM @P AS P ) SELECT PId FROM AggStr WHERE c1 = ',one,three,two'; select PId from @c Child where child.keyword in ('one', 'two', 'three') and not exists (select 1 from @c C1 where C1.PID = Child.PID and C1.Keyword not in ('one','two','three')) group by pid having count(distinct keyword) = 3Looks like my original query performs as good as aggregate query.
Premature optimization is the root of all evil in programming. (c) by Donald Knuth
Naomi Nosonovsky, Sr. Programmer-Analyst
My blog- Marked As Answer by Kalman TothMicrosoft Community Contributor, Moderator Wednesday, June 30, 2010 9:41 AM
-
Tuesday, June 29, 2010 5:37 PMModerator
Try:
USE tempdb; GO DECLARE @P TABLE (PId int NOT NULL UNIQUE); DECLARE @C TABLE (PId int NOT NULL, keyword varchar(25) NOT NULL, UNIQUE (PId, keyword)); INSERT INTO @P VALUES(1); INSERT INTO @P VALUES(2); INSERT INTO @P VALUES(3); INSERT INTO @C VALUES(1, 'one'); INSERT INTO @C VALUES(1, 'two'); INSERT INTO @C VALUES(1, 'three'); INSERT INTO @C VALUES(1, 'four'); INSERT INTO @C VALUES(2, 'one'); INSERT INTO @C VALUES(2, 'two'); INSERT INTO @C VALUES(2, 'three'); INSERT INTO @C VALUES(3, 'one'); INSERT INTO @C VALUES(3, 'two'); SELECT P.PId FROM @P AS P INNER JOIN @C AS C ON C.PId = P.PId WHERE C.keyword in ('one', 'two', 'three') GROUP BY P.PId HAVING COUNT(DISTINCT C.keyword) = 3 AND COUNT(DISTINCT C.keyword) = (SELECT COUNT(*) FROM @C AS C2 WHERE C2.PId = P.PId); WITH AggStr AS ( SELECT PId, (SELECT CAST(',' AS varchar(MAX)) + C.keyword FROM @C AS C WHERE C.PId = P.PId ORDER BY C.keyword FOR XML PATH('')) AS c1 FROM @P AS P ) SELECT PId FROM AggStr WHERE c1 = ',one,three,two'; GOSometime using aggregate string could be handy.
Keep in mind that I am not using a foreign key constraint in the example to enforce [PId] domain in [@C] table.
Divided We Stand: The SQL of Relational Division
AMB
- Marked As Answer by Bqsoftware1 Tuesday, June 29, 2010 6:28 PM
-
Wednesday, June 30, 2010 11:56 AM
Hi Hunchback
Thanks this link !!!
I like "Joe celko"
http://www.simple-talk.com/author/joe-celko/Hehehe I have arranged "Relational Division"
Arrange point is to derive record count using OLAP Count before inner Join.with p(PId) as( select 1 union select 2 union select 3), c(PId,keyword) as( select 1,'one' union select 1,'two' union select 1,'three' union select 1,'four' union select 2,'one' union select 2,'two' union select 2,'three' union select 3,'one' union select 3,'two') select p.PId from p,(select PId,keyword, count(*) over(partition by PId) as cnt from c) c where p.PId = c.PId and c.keyword in('one','two','three') group by p.PId,c.cnt having count(*) = 3 and count(*) = c.cnt; PId --- 2My SQLServer OLAP samples
http://oraclesqlpuzzle.hp.infoseek.co.jp/sqlserver2008-sql1-olap.html- Proposed As Answer by Aketi Jyuuzou Wednesday, June 30, 2010 11:59 AM
-
Wednesday, June 30, 2010 12:24 PM
I like to keep things simple.
SELECT Parent_ID FROM Child GROUP BY Parent_ID HAVING MIN(CASE WHEN Keyword IN ('one', 'two', 'three') THEN 1 ELSE 0 END) = 1 -- No extra record outside the "scope" AND SUM(CASE WHEN Keyword IN ('one', 'two', 'three') THEN 1 ELSE 0 END) = 3 -- All records should match- Proposed As Answer by Naomi NMicrosoft Community Contributor, Moderator Wednesday, June 30, 2010 3:25 PM
-
Wednesday, June 30, 2010 1:09 PMModerator
Peso,
That is a neat solution.
AMB
-
Wednesday, June 30, 2010 1:19 PMThank you!
-
Wednesday, June 30, 2010 3:27 PMModeratorVery neat solution! Would be nice to get performance results for all solutions suggested.
Premature optimization is the root of all evil in programming. (c) by Donald Knuth
Naomi Nosonovsky, Sr. Programmer-Analyst
My blog -
Wednesday, June 30, 2010 6:33 PM
-
Wednesday, June 30, 2010 6:44 PMModerator
Thanks, looks like yours is a winner. BTW, There are no Naomi 1 & 2 - they are both Alejandro's. I just tried to test his 2 and mine together. In any case, few records sample is not that helpful in performance tests.
Premature optimization is the root of all evil in programming. (c) by Donald Knuth
Naomi Nosonovsky, Sr. Programmer-Analyst
My blog -
Wednesday, June 30, 2010 7:48 PM
True, but the execution plans (and STATISTICS IO) do tell a lot of the predicted behaviour.
I just blogged about this question here http://weblogs.sqlteam.com/peterl/archive/2010/06/30/Relational-division.aspx.
And as I wrote in the blog, the second aggregation filter is just for make sure there are at least three matches.
So, what to do with the cases where there are more keywords, as there are for ParentID 1?
The solution (at least for me) is to think about the fractional part in the division. Just a any other division, there cannot be a fractional part (other than zero) if the division is valid. So the first aggregation filter does that, it makes sure there is not "fractional" part of the relational division.
//Peso- Proposed As Answer by Naomi NMicrosoft Community Contributor, Moderator Wednesday, June 30, 2010 8:07 PM
-
Wednesday, June 30, 2010 8:08 PMModeratorIn your blog post I don't see XML solution. You may want to add it for completeness and the link to Celko blog.
Premature optimization is the root of all evil in programming. (c) by Donald Knuth
Naomi Nosonovsky, Sr. Programmer-Analyst
My blog -
Wednesday, June 30, 2010 8:39 PM
Good point, and corrected. Thank you.
I also took the opportunity to read this link http://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/
and I must say I think my solution is the simplest of them all.


