Answered by:
why is my correlated subquery doubling rows?

Question
-
Hi,
I use a subquery in my syntax, here:
SELECT distinct t.Date_ID
,t.Publisher_Name
,t.Bundle_Key
,(select sum(Download_Count)
from dbo.Dwh_Fact_Agg_Distro_Engine tt
where tt.Bundle_Key=t.Bundle_Key
and tt.Publisher_Name=t.Publisher_Name
and tt.Date_ID=t.Date_ID
and t.Phase_Key=tt.Phase_Key
and tt.Phase_key=5 -- that's the problematic line
) as Num_Of_Inits
FROM Dwh_Fact_Agg_Distro_Engine t
where t.Date_ID=20120901
order by
[Date_ID]
,[Publisher_Name]
,Bundle_KeyFor some reason when I add the line "and tt.Phase_key=5 " the result set includes an additional row for each date, publisher & bundle, with "Num_Of_Inits" as NULL.
for example the first line of the result set WITHOUT the line "and tt.Phase_key=5 " will look like:
- 20121101 ; Brothersoft ; er47653 ; 34
If i add the line "and tt.Phase_key=5 " i will get the same row (which is fine) and an additional row with the same parameters but "Num_Of_Inits" as NULL:
- 20121101 ; Brothersoft ; er47653 ; NULL
- 20121101 ; Brothersoft ; er47653 ; 34
anyone knows why this happens?
thanks!
Monday, November 12, 2012 10:01 PM
Answers
-
You can get your solution also by using windows function, e.g.
SELECT distinct t.Date_ID ,t.Publisher_Name ,t.Bundle_Key ,sum(case when t.Phase_Key = 5 then Download_Count else 0 end) OVER (partition by Date_Id, Publisher_Name) as Num_Of_Inits FROM Dwh_Fact_Agg_Distro_Engine t where t.Date_ID=20120901 order by [Date_ID] ,[Publisher_Name] ,Bundle_Key
In your code you're joining based on Phase_Key and also using condition Phase_Key = 5. This does not make sense unless all Phase_Key are 5. What is the desired JOIN (partition) condition?
For every expert, there is an equal and opposite expert. - Becker's Law
My blogMonday, November 12, 2012 10:33 PM
All replies
-
You can get your solution also by using windows function, e.g.
SELECT distinct t.Date_ID ,t.Publisher_Name ,t.Bundle_Key ,sum(case when t.Phase_Key = 5 then Download_Count else 0 end) OVER (partition by Date_Id, Publisher_Name) as Num_Of_Inits FROM Dwh_Fact_Agg_Distro_Engine t where t.Date_ID=20120901 order by [Date_ID] ,[Publisher_Name] ,Bundle_Key
In your code you're joining based on Phase_Key and also using condition Phase_Key = 5. This does not make sense unless all Phase_Key are 5. What is the desired JOIN (partition) condition?
For every expert, there is an equal and opposite expert. - Becker's Law
My blogMonday, November 12, 2012 10:33 PM -
For me it looks like it is because of distinct.
There is multiple phase keys and for each of them you have 34 rows.
If you eliminate "not 5" in subquery and query all of them in master query you will get rows with nulls.
Also, all columns from output are included in the where - reqular group by will be enough to simplify the query:
SELECT t.Date_ID ,t.Publisher_Name ,t.Bundle_Key ,sum(Download_Count) AS Num_Of_Inits FROM Dwh_Fact_Agg_Distro_Engine t where t.Date_ID=20120901 AND t.Phase_key = 5 group by [Date_ID] ,[Publisher_Name] ,Bundle_Key order by [Date_ID] ,[Publisher_Name] ,Bundle_Key
Tuesday, November 13, 2012 12:58 AM -
Try
SELECT distinct t.Date_ID
,t.Publisher_Name
,t.Bundle_Key
,(select sum(isnull(Download_Count,0))
from dbo.Dwh_Fact_Agg_Distro_Engine tt
where tt.Bundle_Key=t.Bundle_Key
and tt.Publisher_Name=t.Publisher_Name
and tt.Date_ID=t.Date_ID
and t.Phase_Key=tt.Phase_Key
and tt.Phase_key=5 -- that's the problematic line
) as Num_Of_Inits
FROM Dwh_Fact_Agg_Distro_Engine t
where t.Date_ID=20120901
order by
[Date_ID]
,[Publisher_Name]
,Bundle_Keyas I think there is a null value of Download_Count which is leading to the problem.
Many Thanks & Best Regards, Hua Min
- Edited by Jackson_1990 Tuesday, November 13, 2012 1:21 AM
Tuesday, November 13, 2012 1:20 AM -
I wish you had the good manner to post DDL. A date is a temporal value, so “date_id” is absurd! Would you say “inch_id” as a column name? Read your ISO-11179 or any book on data modeling. Look up the term “attribute property” and do not make this error again. SELECT DISTINCT is symptom of bad schema design that forced redundancies, adn when you see it with an aggregate, it is even worse.
But without DDL, we cannot help you. You never explain what this is supposed to do, either. Here is a guess from the little you gave us:
SELECT something_date, publisher_name, bundle_key,
SUM(download_count) AS init_cnt
FROM DWH_Fact_Agg_Distro_Engine AS T
WHERE something_date = '2012-09-01'
AND phase_key = 5
GROUP BY something_date, publisher_name, bundle_key;--CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL
Tuesday, November 13, 2012 1:58 AM -
You can get your solution also by using windows function, e.g.
SELECT distinct t.Date_ID ,t.Publisher_Name ,t.Bundle_Key ,sum(case when t.Phase_Key = 5 then Download_Count else 0 end) OVER (partition by Date_Id, Publisher_Name) as Num_Of_Inits FROM Dwh_Fact_Agg_Distro_Engine t where t.Date_ID=20120901 order by [Date_ID] ,[Publisher_Name] ,Bundle_Key
In your code you're joining based on Phase_Key and also using condition Phase_Key = 5. This does not make sense unless all Phase_Key are 5. What is the desired JOIN (partition) condition?
For every expert, there is an equal and opposite expert. - Becker's Law
My blogThanks Naomi that's very helpful!
(there's an error in the page preventing me from marking your answer, I'll try again later)
:)
- Edited by 2rn Tuesday, November 13, 2012 5:25 AM
Tuesday, November 13, 2012 5:22 AM