locked
Best Practice for Counts/Distinct Count Measures for Filtered Data RRS feed

  • Question

  • Let's say the Count/Distinct Count measure is based on filtered data.

    When the situation allows, is it better to do a CASE statement with 1s and 0s from the table in the DSV and then do a SUM on this field?  Or is it better to create a new Named Query in the DSV filtering out the data through the WHERE clause based off of the original table and then do a DISTINCT COUNT measure on the newly created Named Query?

    If there's no clear cut answer as to what is the "better" approach, please offer up the advantages and disadvantages for both.  My preference is best performance.


    An Example:

    Measure: Active SKU Count
    Products dimension has product_key, SKU and Status

    In the DSV for the Products dimension, should I add a CASE statement:
    CASE WHEN [Status] = 'Active' THEN 1 ELSE 0 END AS [Active Status]
    Then create a new SUM measure based on the Active Status column.

    OR

    Create a new Named Query in the DSV:
    SELECT * from Products WHERE Status = 'Active'
    Then create a new DISTINCT COUNT measure based on the SKU column from the new Named Query?




    Wednesday, August 15, 2007 10:28 PM

Answers

  • These 2 alternatives do not produce the same results. If a given SKU has fact records in Jan, Feb, and Mar, the first approach using SUM would return a value of 3 for Calendar Quarter 1. Then second approach based on a distinct count measure would return 1 (which is probably what you want)

     

    Distinct count measures should be implemented in their own measure group anyway, so if you only ever need the active count then filtering out the non-active records is a good option.

     

    If you need to get distinct counts of both active and inactive SKUs then you would be better off adding the Active flag as an attribute to the product dimension and implementing it as a slowly changing dimension.

     

    Wednesday, August 15, 2007 11:55 PM

All replies

  • These 2 alternatives do not produce the same results. If a given SKU has fact records in Jan, Feb, and Mar, the first approach using SUM would return a value of 3 for Calendar Quarter 1. Then second approach based on a distinct count measure would return 1 (which is probably what you want)

     

    Distinct count measures should be implemented in their own measure group anyway, so if you only ever need the active count then filtering out the non-active records is a good option.

     

    If you need to get distinct counts of both active and inactive SKUs then you would be better off adding the Active flag as an attribute to the product dimension and implementing it as a slowly changing dimension.

     

    Wednesday, August 15, 2007 11:55 PM
  • Darren,

    I had a feeling that these 2 alternatives would not produce the same results when I was typing up the question.  Perhaps my example scenario wasn't a good one, because the limited testing with my applicable situation yielded the same results. 

    In any event, I will just go ahead and go forward with the separate measure group with the new Named Query approach.

    Thanks!
    Thursday, August 16, 2007 4:29 PM