locked
Distinct Counts in Cube are incorrect by one RRS feed

  • Question

  • I am having an issue where my cube's distinct count is coming in one higher than my SQL distinct count against the same source table.  I am using SQL Server 08, SSAS 08, and Excel 07.  The Fact Table that I am using for the distinct count includes NULLs in the column I am counting.  An example of the table is shown below (I am trying to get a Distinct Count of Col.C):

    Col.A    Col.B    Col.C

    A          B           C

    A          B           NULL

    A         C           D

    B         A          NULL

    I believe this is happening because the cube is doing a distinct count against a column in the Fact Table that has NULLs.  I am pretty confident that SQL Server is ignoring the NULLs in its Distinct Count but the cube is counting the NULL as a distinct value.  I am using the friendly UI version of Distinct Count from the 'Cube Structure' tab under the 'Measures' Section.   

    To work around this issue, I have created a view of the Fact Table that pulls just the necessary columns and the column I am counting.  In this view, I include a WHERE statement that ignores NULLs.  However, this has not corrected my counts!  They are still coming in 1 higher than my SQL.

    Does anyone know what is going on here?  I am guessing that this must be a common problem.  I would like to find a more elegant solution than adding in a MDX Case statement that subtracts one when necessary.     

    Thanks in advance and let me know if you need more info!

    Thursday, January 29, 2009 5:53 PM

Answers

  • hi,

    first of all - best pracitice for Distinct count is to separate them into separate measuregroups

    so 1 MG for each Distinct Count measure

    by doing this you can create a second view of your facttable where you filter out NULL-values (as you have done already)

    you also wrote, that filter out still brings wrong results meaning in your example

    that after filtering out the NULLs only C and D are left -> so a distinct count of 2

    in your case this is returning 3?


    - www.pmOne.com -
    • Marked as answer by MickWags Friday, January 30, 2009 7:36 PM
    Friday, January 30, 2009 11:06 AM
    Answerer
  • I got the distinct counts to work.  So oddly enough, it seems that SSAS/Excel/BoogieMan was caching my data and the changes made in the UI were not being reflected in what the code was actually doing.  What caused this problem was that I created the Distinct Count based on the original fact table first, and then decided to change it so that it would count off of the filtered view (WHERE NO NULLS).  This should had fixed my problem but was not being reflected in Excel.

    To fix this problem, I deleted the Measure Group and Measure.  I then started the measure from scratch and created the same Measure Group and Measure to distinct count off of the view.  Presto! - it worked like everyone thought it should have!

    • Marked as answer by MickWags Friday, January 30, 2009 7:36 PM
    Friday, January 30, 2009 7:35 PM

All replies

  • That's interesting... did you try checking the distinct count grouped by other columns/fields (Col A & Col B)? That might help identigy the actual member that is being counted twice..
    ..hegde
    Friday, January 30, 2009 10:43 AM
  • hi,

    first of all - best pracitice for Distinct count is to separate them into separate measuregroups

    so 1 MG for each Distinct Count measure

    by doing this you can create a second view of your facttable where you filter out NULL-values (as you have done already)

    you also wrote, that filter out still brings wrong results meaning in your example

    that after filtering out the NULLs only C and D are left -> so a distinct count of 2

    in your case this is returning 3?


    - www.pmOne.com -
    • Marked as answer by MickWags Friday, January 30, 2009 7:36 PM
    Friday, January 30, 2009 11:06 AM
    Answerer
  • ok, i did some testing:

    create a named query as follows (MG_1):

    SELECT     'a' AS DistincCount, 'a' AS DC2, 100 AS Currency
    UNION ALL
    SELECT     'b' AS DistincCount, NULL AS DC2, 100 AS Currency
    UNION ALL
    SELECT     'c' AS DistincCount, 'b' AS DC2, 100 AS Currency
    UNION ALL
    SELECT     'd' AS DistincCount, NULL AS DC2, 100 AS Currency

    and a second one MG_2

    SELECT     DistincCount, DC2, Currency
    FROM         (SELECT     'a' AS DistincCount, 'a' AS DC2, 100 AS Currency
                           UNION ALL
                           SELECT     'b' AS DistincCount, NULL AS DC2, 100 AS Currency
                           UNION ALL
                           SELECT     'c' AS DistincCount, 'b' AS DC2, 100 AS Currency
                           UNION ALL
                           SELECT     'd' AS DistincCount, NULL AS DC2, 100 AS Currency) AS derivedtbl_1
    WHERE     (NOT (DC2 IS NULL))

     

    distinct count on MG_1 column Distinct Count =4 - correct

    distinct count on MG_1 column DC2 =3 ('a', 'b' and NULL)

    distinct count on MG_2 column Distinct Count =2 - correct

     

    so, NULL is counted as a value for distinct count measures

    well more or less - by default NULL is converted to empty string or 0 while processing the cube

    so our values are 'a', '', 'b', '' where 3 would be correct

    i also tried to set NullProcessing to Preserver what resulted in an error processing the Measuregroup

     

    removing the NULLs in DSV works as expected

     


    - www.pmOne.com -
    Friday, January 30, 2009 11:22 AM
    Answerer
  • I got the distinct counts to work.  So oddly enough, it seems that SSAS/Excel/BoogieMan was caching my data and the changes made in the UI were not being reflected in what the code was actually doing.  What caused this problem was that I created the Distinct Count based on the original fact table first, and then decided to change it so that it would count off of the filtered view (WHERE NO NULLS).  This should had fixed my problem but was not being reflected in Excel.

    To fix this problem, I deleted the Measure Group and Measure.  I then started the measure from scratch and created the same Measure Group and Measure to distinct count off of the view.  Presto! - it worked like everyone thought it should have!

    • Marked as answer by MickWags Friday, January 30, 2009 7:36 PM
    Friday, January 30, 2009 7:35 PM
  • I too have this problem where the Distinct Count is counting NULLS.  I tried using a DSV and it did give me the correct value.  However, it gave me the exact same value at every level of the heirarchies I tested this with.  i.e. The All level of the heirarchy gave a value of 37 and level 4 of the heirarchy gave a value of 37 too.  I did setup the relationships between the DSV and the dimensions in the same way as the fact table.  Is there someting else I am missing to get the DSV to filter at different levels?
    Monday, February 16, 2009 9:39 PM