none
Eliminating Top/Bottom 5% of values then averaging what's left RRS feed

  • Question

  • Forgive my newness to SQL MDX, but I am trying to figure out how to do a calculation that I was previously doing via T-SQL, and make it work in my cube.  I have a column in my Fact table representing "hours".  I want to be able to exclude the top/bottom 5 % of the hours values in order to calculate the average of what's left. 

    So the values are something like this in the Fact table:
    Record           Hours
    Row1               0
    Row2               1                         
    Row3               1
    Row4               16
    Row5               25
    Row6               27
    Row7               43
    Row8               111
    Row9               112
    Row10             12312

    I want to be able to exclude Row 1 and Row 10 Hours, and take the average of the hours for Rows 2 through 9.  Is this possible?  I've searched and found other related threads around TOPPERCENT, but haven't had much luck getting it to work with my scenario.  Any help is appreciated!

    EDITED to Add - This is in SQL Server 2005.
    Wednesday, July 22, 2009 6:07 PM

Answers

  • Hi

    As Deepak said you should make a view or named query and do the filter there . Any row-by-row computation is very slow in MDX . But to your question if it can be done or not .. answer is yes. I have tried to simulate a scenario where I take top 5 customers and bottom 5 customers by their internet sales and then removing them out from total set and sales for the remaining set . I have done it step by step so you understand how set theory works to get and filter out sets.


    with 
    set TopN as {topcount([Customer].[Customer Geography].[Customer].members ,5,[Measures].[Internet Sales Amount])}
    set BottomN as {bottomcount([Customer].[Customer Geography].[Customer].members ,5,[Measures].[Internet Sales Amount])}
    set AllExceptTopNandBottomN AS
    	EXCEPT({[Customer].[Customer Geography].[Customer].members},UNION({TopN},{BottomN}))
    	
    
    select {[Measures].[Internet Sales Amount]} on 0
    --,{[Customer].[Customer Geography].[Customer].members} on 1--18485 total customers
    ,{AllExceptTopNandBottomN} on 1
    --,{TopN} on 1
    --,{BottomN} on 1 
    from [Adventure Works]



    Hope this gives you idea. (I still back Deepak's suggestion to do this in relational layer and bring it to cube and not do it in cube because of performance issues of doing this row by row calcualtion)

    Ashwani Roy

    My Blog

    Please click the Mark as Answer button if a post solves your problem!

    Wednesday, July 22, 2009 8:13 PM
    Answerer

All replies

  • If the fact table is large, filtering fact rows in the cube via MDX could be slow. Could you instead filter the rows in a fact table view (or named query) and build a measure group in the cube from that? If other measures need to include these rows, another measure group in the cube can be built from the original fact table.
    - Deepak
    Wednesday, July 22, 2009 6:45 PM
    Moderator
  • Hi

    As Deepak said you should make a view or named query and do the filter there . Any row-by-row computation is very slow in MDX . But to your question if it can be done or not .. answer is yes. I have tried to simulate a scenario where I take top 5 customers and bottom 5 customers by their internet sales and then removing them out from total set and sales for the remaining set . I have done it step by step so you understand how set theory works to get and filter out sets.


    with 
    set TopN as {topcount([Customer].[Customer Geography].[Customer].members ,5,[Measures].[Internet Sales Amount])}
    set BottomN as {bottomcount([Customer].[Customer Geography].[Customer].members ,5,[Measures].[Internet Sales Amount])}
    set AllExceptTopNandBottomN AS
    	EXCEPT({[Customer].[Customer Geography].[Customer].members},UNION({TopN},{BottomN}))
    	
    
    select {[Measures].[Internet Sales Amount]} on 0
    --,{[Customer].[Customer Geography].[Customer].members} on 1--18485 total customers
    ,{AllExceptTopNandBottomN} on 1
    --,{TopN} on 1
    --,{BottomN} on 1 
    from [Adventure Works]



    Hope this gives you idea. (I still back Deepak's suggestion to do this in relational layer and bring it to cube and not do it in cube because of performance issues of doing this row by row calcualtion)

    Ashwani Roy

    My Blog

    Please click the Mark as Answer button if a post solves your problem!

    Wednesday, July 22, 2009 8:13 PM
    Answerer
  • Ashwani,
      Would you recommend the relational view pattern for a top 10 list as well? Or is the topcount generally fast enough?
    I am looking at something like Top 10 departments by overtime hours.

    Thanks for your insight
    Doug
    Wednesday, July 22, 2009 8:29 PM
  • topcount should be fine for your case.  I dont see why mere topcount will cause issue.
    In my problem above it is little more than that , so relational filter will  be better.

    Hope this helps

    Ashwani Roy

    My Blog

    Please click the Mark as Answer button if a post solves your problem!

    Wednesday, July 22, 2009 8:33 PM
    Answerer
  • Sorry for the delay in response - (looks like my original reply didn't post) I am going to do this in a relational layer, but I was sure to save this code snippet for future use.  Thanks for your help!
    Wednesday, July 29, 2009 5:29 PM