DAX STDEV & RANK functions in calculated column on subsets of data in same table
-
samedi 28 avril 2012 18:17
I would like to use a DAX STDEV and separately also apply a RANK function to create a calculated column with application of the functions to subsets of data in the same table. I have a table with fields / columns for TestValue, Customer, SampleType, SampleYear, and SampleOrigin. I would like to show the rank for each TestValue in the table within each aggregate subset of Customer X Year X SampleType X SampleOrigin. SampleType and SampleOrigin are text fields. I cannot figure out how to do this, but there must be a way. Similarly, in a different column I would like to calculate the SD for each subset for all samples in the same subset. It would not work to break the subsets into separate columns, because there are thousands of customers.
I would most appreciate any thoughts on how to do this.
Thanks,
Buzz
Buzz Burhans
Toutes les réponses
-
dimanche 29 avril 2012 09:51
Buzz,
If you're using the latest version of Power Pivot you can solve this with the new RANK function.
Assuming your table is called Table1, here is the measure to be created from the Pivot table for the rank.
=RANKX(ALL(Table1[Customer]),[Sum of TestValue])
The formula for the STDEV measure is:
=STDEV.P(Table1[TestValue])
I added the SampleYear, SampleOrigin and SampleType as Slicers on the Pivot Table and added the Customer on rows with TestValue in the Values.
The beauty of PowerPivot means the measures we create are responsive to the slicers. Hence when clicking on them the STDEV and Rank is re-calculated.
Does this make sense? Have a read of my blog post for more info and a link to a sample book.
http://www.leehbi.com/2012/04/rank-and-stdev-in-power-pivot-2012
Regards,
Lee
- Modifié Lee Hawthorn dimanche 29 avril 2012 12:12
- Modifié Lee Hawthorn dimanche 29 avril 2012 13:25
-
dimanche 29 avril 2012 13:23
Thanks Lee. I don't think this does it, because within each customer there are subsets, For instance:
SampleID Customer Type Origin Year TestValue
Sample1 Customer1 Type1 Origin1 2011 65
Sample2 Customer1 Type1 Origin1 2011 75
Sample3 Customer1 Type2 Origin1 2010 55
Sample4 Customer1 Type2 Origin2 2010 63
Sample5 Customer2 Type1 Origin1 2012 15
etc...
So I need a way to apply the RANKX and function to each subset, where the subset is defined by the unique combination of Customer X Type X Origin X Year, and there are multiple samples (for each subset, the number of samples could be 0 to N, N>25). I think the table description you have described works for subsets defined by a single column such as [CUSTOMER] in your RANKX sample above, but I need to define the rank within subsets, where the rank within the subset is for each sample value , not for the aggregate sum defined by the subset definition. Similarly, I need the STDEV.S for each subset. I also need this to work when there are either 0 or 1 number of samples in a subset. Is this any clearer?
Thanks very much for your help.
Buzz
I appreciate your
Buzz Burhans
-
dimanche 29 avril 2012 18:32
Thanks again Lee, for your interest.
I missed the reference to your blog post when I read your response above earlier. I just now read your blog post, however, and realize where I was unclear. In your blog post you say "The output needs to show the Rank of each Customer based on Sales for the subset of data defined by the Year, Origin & Type." This is close, but not exactly what I need to do. What I need to do is rank each "Sale" ("TestValue" in my case) within the subset defined by "for the subset of data defined by the Year, Origin & Type" AND Customer. i.e. the customer is part of the subsetting, and what I want to achieve is the rank and then the SD of the individual "TestValues" (or in your example, individual "Sales") within each subset defined by Customer, Year, Origin & Type.
Also, I need this in a calculated column, not in a pivot table, because based on the rank and / or the Mean & SD for each TestValue in the subset, I need to construct an indicator variable that allows me to exclude the another subset of the lowest values in each subset, i.e. I want to exclude a subset of each subset.
Thanks again for your interest, and hopefully this is clearer.
Buzz
Buzz Burhans
-
mardi 1 mai 2012 03:17
Buzz,
Try one of the following:
=RANKX(FILTER(Table1,[Customer]=EARLIER([Customer]) && [Type]=EARLIER([Type])
&& [Year]=EARLIER([Year]) && [Origin]=EARLIER([Origin])), Table1[TestValue] )or
=CALCULATE(COUNTROWS(Table1), ALLEXCEPT(Table1,Table1[Customer],Table1[Type],Table1[Origin],Table1[Year]), Table1[TestValue]>=EARLIER(Table1[TestValue]) )
- Marqué comme réponse Buzz Burhans mercredi 2 mai 2012 20:16
-
mercredi 2 mai 2012 08:29
please check the below link.
http://www.powerpivotpro.com/2011/09/
-
mercredi 2 mai 2012 20:41
ruve1k, Thanks very much for this response. Both versions worked for generating the rank. For those who read this in the future, the first version using RANKX calculates substantially more quickly than the 2nd; the Calculate(COUNTROWS... version works but took a long time to calculate. Both versions work.
I also tried playing with the syntax a bit because one of the terms (Type) has only two values ("Ensiled" and "Dry"), and I wanted to exclude samples where Type = "Dry" from the ranking. I wrote the Filter statement for that part as ...&& [Type]="Ensiled" && ...but the filter failed in that the ranks were applied to both types anyway. I later wrapped the RANKXX formula in an IF statement to exclude the [Type]="Dry" and that worked, so I didn't play with it any further. It seems that the syntax I used should have worked though.
Nonetheless, your syntax worked well to generate the RANK, and I very much appreciate the help. For any MS personnel that read this, it doesn't make a lot of sense to me that RANKX can't be filtered more easily to subsets of a calculated column,to accomplish similar to the what the EARLIERs do in the second version here using COUNTROWS. Having to use such indirect syntax by using EARLIER is certainly less intuitive than being able to apply ALLEXCEPT directly to RANKX for instance.
Thanks again for the help with this.
Buzz
Buzz Burhans
- Modifié Buzz Burhans jeudi 3 mai 2012 00:15
-
mercredi 2 mai 2012 20:52
nanurahi, Thanks for this response. I did not end up trying to implement this because the earlier response I received seemed to work; however, I'm not sure the post at the link would work on subsets of the data in the same table. The problem I had was trying to apply a filter to all possible subsets where it was impractical to specify each filter exactly because one of the filter terms was Customer, and there were thousands of Customers, making explicit filter statements something like [CUSTOMER]="Name1" impractical because of the number of Customer Names. I note that in the linked post the filter terms are applied with ALL, and in my case the problem was I wanted to apply the RANKX function to each subset separately. I have bookmarked the link you referred me to though, because I think it will be useful when I have to generate ranks based on Percentiles.
The approach in that post might have actually worked, however, I did not try to modify it and test whether it would as the earlier response accomplished my ranking.
I am very grateful for your assistance.
Buzz
Buzz Burhans
- Modifié Buzz Burhans jeudi 3 mai 2012 00:17
-
mercredi 2 mai 2012 21:11
I received several responses to this post which enabled me to solve my rank problem, however, there were no responses which addressed the application of STDEV.S to a subset of the data in the same table, generating a calculated column, not applying the measure in a Pivot Table. I was eventually able to accomplish this with the following:
=IF(ISBLANK(Table1[TestValue]),BLANK(),IF(Table1[Type]<>"Ensiled",BLANK(),IF(COUNTX(Filter(Table1,[Customer]=EARLIER([Customer]) && [Type]="Ensiled" && [Origin]=EARLIER([Origin]) && [Year]=EARLIER([Year])),Table1[TestValue])<2,BLANK(),STDEVX.S(Filter(Table1,[Customer]=EARLIER([Customer]) && [Type]="Ensiled" && [Origin]=EARLIER([Origin]) && [Year]=EARLIER([Year])),[TestValue]))))
The COUNTX portion makes sure that there are at least 2 values to work with.
The 2 IF statements are specific to idiosyncrasies of my data and intent and are not germane to applying the STDEVX function.
The EARLIER statements were the only way I could find to correctly filter this data into subsets which are combinations of the 4 factors Customer, Type, Origin, and Year.
As I indicate in my comments on the response that solved the ranking problem using EARLIER, the need to use such a convoluted syntax involving EARLIER instead of a more straightforward way to filter and apply the function to each specific subset in the table is not intuitive to me, and seems to unnecessarily complicate the filtering process compared to being able to use ALLEXCEPT with STDEV.S for instance.
Buzz Burhans
- Modifié Buzz Burhans mercredi 2 mai 2012 21:15
- Marqué comme réponse Buzz Burhans mercredi 2 mai 2012 21:16
- Modifié Buzz Burhans jeudi 3 mai 2012 00:14

