Unanswered DWH: Distinct aggregate

  • Friday, April 24, 2009 10:04 PM
     
     
    hey guys, i have the following problem and i can't solve it and can't find it in the literature either, because i don't know how to name it.  

    .) imagine a website with different Objects(ObjectID, value) online. I designed Objects as Dimensiontable and my Facttable is just: (ObjectID, DateID)  
    .) So let's say on day 1 there are 100 Objects(ProductID 1-100) online - i can calculate the average value easily.  
    .) on day 2 ObjectID, 101, 102,... are online, and ObjectID, 55,66,... offline and so on.  

    Now i want to calculate the average value of all objects within one month. But what i need are distinct objects, so i assume i can't use aggregation, because the value of an object, that was online for 10 days would have more weight in the average value than an object that was online for just 1 day...  

    i was thinking about an approach, where i have FactTables like FactTableDay (ObjectID, DayID), FactTableMonth(ObjectID, MonthID), FactTableYear(ObjectID, YearID), where i store all distinct objects, that where online for at least 1 day within one month, year...)  

    Another idea was to use the keyword DISTINCT in every measure, but wouldn't it be very slow?  

    In fact my project is much more complex of course and there are about 50000 objects online/day and let's say 100 new objects/day and 100 objects go offline/day, and i need a lot of measures like median, max, min, variance,... for week, month, year,... 

    any hints? thanks in advance

All Replies

  • Sunday, April 26, 2009 9:13 AM
     
     
    If you create a Distinct Count measure - Distinct Count of Object IDs in the Analysis Services cube (which would always get a count of distinct Object IDs irrespective of how many times they appear in the transactions, will work as a distinct count across days/months/any other slicing context you use in the query), wouldn't that work in your case for some reason?
    ..hegde
  • Sunday, April 26, 2009 10:20 AM
     
     
    yes it would work, but i think it must be very slow when i try to calculate 1 year with the distinct keyword, but obviously there is no other solution
    thanks
  • Sunday, April 26, 2009 4:10 PM
     
     

    true, distinct count might be expensive but you may want to try with some aggregations built for the distinct count measure group (when the aggregations are pre-built, that should help alleviate query performance issue).


    ..hegde
  • Sunday, April 26, 2009 7:57 PM
     
     
    Thanks for your help, i found some usefull links like http://www.databasejournal.com/features/mssql/article.php/10894_3681856_3/Distinct-Counts-in-Analysis-Services-2005.htm

    it works fine for the count, max, min - functions, but for other measures like median, quantiles,... i have to write a calculated measure with distinct, which is probably very slow or i can try to build facttables for each time-period (FactTableDay, FactTableWeek,...) which distinct objects inside, but it would generate redundant data.

    if someone has another good idea please tell me,

    thanks