locked
why is my correlated subquery doubling rows? RRS feed

  • Question

  • Hi,

    I use a subquery in my syntax, here:


    SELECT distinct t.Date_ID
          ,t.Publisher_Name
          ,t.Bundle_Key
          ,(select sum(Download_Count)
      from dbo.Dwh_Fact_Agg_Distro_Engine tt
      where tt.Bundle_Key=t.Bundle_Key
      and tt.Publisher_Name=t.Publisher_Name
      and tt.Date_ID=t.Date_ID
      and t.Phase_Key=tt.Phase_Key
      and tt.Phase_key=5        -- that's the problematic line
      ) as Num_Of_Inits
    FROM Dwh_Fact_Agg_Distro_Engine t
    where t.Date_ID=20120901
    order by
    [Date_ID]
    ,[Publisher_Name]
    ,Bundle_Key 

    For some reason when I add the line "and tt.Phase_key=5 " the result set includes an additional row for each date, publisher & bundle, with "Num_Of_Inits" as NULL.

    for example the first line of the result set WITHOUT the line "and tt.Phase_key=5 " will look like:

    • 20121101 ; Brothersoft ; er47653 ; 34

    If i add the line  "and tt.Phase_key=5 " i will get the same row (which is fine) and an additional row with the same parameters but "Num_Of_Inits" as NULL:

    • 20121101 ; Brothersoft ; er47653 ; NULL
    • 20121101 ; Brothersoft ; er47653 ; 34

    anyone knows why this happens?

    thanks! 

        

    Monday, November 12, 2012 10:01 PM

Answers

  • You can get your solution also by using windows function, e.g.

    SELECT distinct t.Date_ID
          ,t.Publisher_Name
          ,t.Bundle_Key
          ,sum(case when t.Phase_Key = 5 then Download_Count else 0 end) OVER (partition by
    Date_Id, Publisher_Name)
       as Num_Of_Inits
    FROM Dwh_Fact_Agg_Distro_Engine t
    where t.Date_ID=20120901
    order by 
    [Date_ID]
    ,[Publisher_Name]
    ,Bundle_Key 
    In your code you're joining based on Phase_Key and also using condition Phase_Key = 5. This does not make sense unless all Phase_Key are 5. What is the desired JOIN (partition) condition?


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    Monday, November 12, 2012 10:33 PM

All replies

  • You can get your solution also by using windows function, e.g.

    SELECT distinct t.Date_ID
          ,t.Publisher_Name
          ,t.Bundle_Key
          ,sum(case when t.Phase_Key = 5 then Download_Count else 0 end) OVER (partition by
    Date_Id, Publisher_Name)
       as Num_Of_Inits
    FROM Dwh_Fact_Agg_Distro_Engine t
    where t.Date_ID=20120901
    order by 
    [Date_ID]
    ,[Publisher_Name]
    ,Bundle_Key 
    In your code you're joining based on Phase_Key and also using condition Phase_Key = 5. This does not make sense unless all Phase_Key are 5. What is the desired JOIN (partition) condition?


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    Monday, November 12, 2012 10:33 PM
  • For me it looks like it is because of distinct.

    There is multiple phase keys and for each of them you have 34 rows.

    If you eliminate "not 5" in subquery and query all of them in master query you will get rows with nulls.

    Also, all columns from output are included in the where - reqular group by will be enough to simplify the query:

    SELECT t.Date_ID
          ,t.Publisher_Name
          ,t.Bundle_Key
          ,sum(Download_Count) AS Num_Of_Inits
    FROM
       Dwh_Fact_Agg_Distro_Engine t
    where t.Date_ID=20120901
       AND t.Phase_key = 5
    group by
       [Date_ID]
       ,[Publisher_Name]
       ,Bundle_Key 
    order by
       [Date_ID]
       ,[Publisher_Name]
       ,Bundle_Key 

    Tuesday, November 13, 2012 12:58 AM
  • Try

    SELECT distinct t.Date_ID
          ,t.Publisher_Name
          ,t.Bundle_Key
          ,(select sum(isnull(Download_Count,0))
      from dbo.Dwh_Fact_Agg_Distro_Engine tt
      where tt.Bundle_Key=t.Bundle_Key
      and tt.Publisher_Name=t.Publisher_Name
      and tt.Date_ID=t.Date_ID
      and t.Phase_Key=tt.Phase_Key
      and tt.Phase_key=5        -- that's the problematic line
      ) as Num_Of_Inits
    FROM Dwh_Fact_Agg_Distro_Engine t
    where t.Date_ID=20120901
    order by
    [Date_ID]
    ,[Publisher_Name]
    ,Bundle_Key

    as I think there is a null value of Download_Count which is leading to the problem.


    Many Thanks & Best Regards, Hua Min


    Tuesday, November 13, 2012 1:20 AM
  • I wish you had the good manner to post DDL. A date is a temporal value, so “date_id” is absurd! Would you say “inch_id” as a column name? Read your ISO-11179 or any book on data modeling.  Look up the term “attribute property” and do not make this error again. SELECT DISTINCT is symptom of bad schema design that forced redundancies, adn when you see it with an aggregate, it is even worse. 

    But without DDL, we cannot help you. You  never explain what this is supposed to do, either.  Here is a guess from the little you gave us: 

    SELECT something_date, publisher_name, bundle_key,
           SUM(download_count) AS init_cnt
      FROM DWH_Fact_Agg_Distro_Engine AS T
     WHERE something_date = '2012-09-01'
       AND phase_key = 5
     GROUP BY something_date, publisher_name, bundle_key;


    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Tuesday, November 13, 2012 1:58 AM
  • You can get your solution also by using windows function, e.g.

    SELECT distinct t.Date_ID
          ,t.Publisher_Name
          ,t.Bundle_Key
          ,sum(case when t.Phase_Key = 5 then Download_Count else 0 end) OVER (partition by
    Date_Id, Publisher_Name)
       as Num_Of_Inits
    FROM Dwh_Fact_Agg_Distro_Engine t
    where t.Date_ID=20120901
    order by 
    [Date_ID]
    ,[Publisher_Name]
    ,Bundle_Key 
    In your code you're joining based on Phase_Key and also using condition Phase_Key = 5. This does not make sense unless all Phase_Key are 5. What is the desired JOIN (partition) condition?


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    Thanks Naomi that's very helpful!

    (there's an error in the page preventing me from marking your answer, I'll try again later)

    :) 


    • Edited by 2rn Tuesday, November 13, 2012 5:25 AM
    Tuesday, November 13, 2012 5:22 AM