I have a long-standing problem with performance in unnatural hierarchies. Way back in my naive days when I started working with SSAS, I was under the impression that hierarchies (natural and unnatural) were first-class citizens of SSAS dimensions. I thought that queries using pre-defined unnatural hierarchies were going to perform at least as well as queries which crossjoined the equivalent attributes. It used to be that the SSAS tutorials and MSDN articles would describe unnatural hierachies as a way to provide simple, pre-defined drill down paths that were easier for end users than if they stacked their attribute hierarchies one at a time.
I turns out that unnatural hierarchies have significant performance problems; to the point that I question why the dimension editer allows them to be created at all.
So my question for the performance guru's is what (if any) is the case for creating an unnatural hierarchy in a dimension? My experience is that cardinalities on attribues in the unnatural hierarchy have to be very, very small for them to perform well (eg. 100's of members.)
Do you often find the need to create bogus attributes with composite keys so that you can turn unnatural hierarchies into natural ones?
Also is there a reasonable explanation why Analysis Services shouldn't give unnatural hierarchies the same performance as cross joins?
Anyone? Thanks in advance.
PS. I worked thru an issue with Microsoft a couple years ago on this topic
At the time they said that they would fix performance in the "next major release".
It turns out they thought "PowerPivot" was a more nifty thing to spend their developer time on.
Yes, I'm upset. If anybody thinks that unnatural hierarchies should perform
about as good as crossjoins then please vote for my Microsoft Connect issue! Thank you.
I am not the person that explain all in your scenario but there is a fundamental thing that can help as a start.
When you build unnatural hierarchies you do not have any attribute relations or aggregations that can help you with query performance. The result you get will be built around the key(leaf level) in each dimension. I think that all aggregations will have to built dynamically with unnatural hierarchies.
Natural hierarchies with attribute relations have prebuilt aggregations that will improve query performance with CrossJoins that unnatural hierarchies cannot use.
I am sure that PowerPivot will be the best option for unnatural hierarchies.
Another option is to write a SQL query since that approach use inner joins and not crossjoins like SSAS does. SSAS do not use crossjoins but it is a way to describe how dimensions and fact tables works in SSAS.
I know that unnatural hierarchies do not have the same advantages as natural ones (in terms of simple aggregation design and materializing indexes).
What bothers me is this. That if you create an unnatural hierarchy with unrelated attributes "A" and "B" and then create a query that drills down thru them and shows both levels, this performs much worse than it should. A query that crossjoins those same exact attributes (A * B) performs much better. As far as I can tell, both types of queries are working with the same aggregations, indices, and so on.
So what's up with that? Is there a reasonable explanation? Or is it just not so important for unnatural hierarches to behave well in SSAS?
The only way to increase performance of the unnatural hierarchy (to match the corresponding crossjoin) is to create a new attribute C that is made with the composite key A,B. The names of the members, however, are created the same way as they are on attribute B. Then you create the relationship between A and C. Finally you modify the unnatural hierarchy to use the new attribute C instead of B. In this way it becomes a natural hierarchy. This is a very costly work-around, especially if you have to create aggregations on combinations of the C attribute in addition to the aggregregation combinations that are already in place for A and B.
I'm simply comparing the crossjoin to the unnatural hierarchy. Remember that the attribute relationships and aggregations are exactly the same on both sides of this comparison.
What is the reason for the big performance difference between crossjoining "A" * "B" vs. drilling down thru "A","B" in an unnatural hierarchy? Is it simply that hierarchy performance within SSAS queries is an ongoing work in progress? Or is there a fundamental/conceptual difference between the two that would make it impossible for the unnatural hierarchy drilldown to perform as well as a crossjoin?
The performance differences are totally counter-intuitive to me. And when I tell users that I won't give them the A,B unnatural hierarchy because of performance problems they think I'm blowing smoke, and rightly so (considering they can observe for themselves fact that the "A" * "B" crossjoins work fine !!!)
OK, I understand.
In the "Microsoft SQL Server 2008 Analysis Services Unleashed" page 59, there is a general warning of using unnatural hierarchies "Because of the types of relationsships between attributes".
I can only guess that SSAS would need to evaluate the relations between the keys when you put them in an unnatural hierarchy but has a faster query plan when you crossjoin them as attribute hierarchies.
I will see if I can test this scenario.
I have some simple steps here to repro the issue in AdventureWorks:
Basically attribute "A" needs just a few 100's or 1000's of members before unnatural hierarchy drilldown's on "A","B" perform worse than "A"*"B".
I think you are probably correct that there is a difference in the query plan which would explain the difference in performance. It surprises and confuses me very much that these unnatural hierarchies are still so unreasonably slow after SSAS has had two full version changes since the release of SQL 2005. If I don't understand this, it makes it impossible for me to tell users why I don't want to give them the "A", "B" hierarchy. A KB article that documented this issue with SSAS dimensions would certainly be helpful.
I guess that there is no good query plan for unnatural hierachies. Making Microsoft aware of that by writing about it on Connect and getting votes is a good initiative.
I thought it was a good thing to do a year and a half ago. For some reason nobody is voting. Maybe SSAS developers are overly accustomed now to avoiding the creation of unnatural hierarchies, both for good reasons and illegitimate/bug-related reasons. Now all I'm hoping for is that we can get some documentation about this in the KB.
When a user asks to put attribute "A" and "B" in a useful hierarchy, I'll just say no. I'll say performance will be awful if I do that. Then I can show them the change in performance AND I will be able to point them to the KB if they still have any doubts.
It is normally too expensive to work around this issue based on the minimal gain in user-friendliness. There would be a big payoff if Microsoft would just fix this issue for everyone's sake.
I do not agree since you have a strategy that works, by not building unnatural hierarchies even if the navigation path in a hierarchy is more end user friendly.
On the other hand, if you only build hierarchies of natural hierarchies, end users will know that they will have to use the combination of single attributes for report hierarchies.
You do not agree on what point?
The inability to create hierarchies when users ask for them is a big problem. The inability to drill down in a user hierarchy with decent performance (comparable to a cross join) is a big problem. I don't know about you, but most of my users can hardly explain the difference between "dimension" and a "measure", let alone understand the distinction that SSAS makes between different types of hierarchies based on the relationships between their attribute keys...
It is probably not necessary, but let me drive the point further. Hierarchies are needed for good OLAP user experience. Many meaningful hierarchies are NOT "SSAS natural hierarchies", believe it or not. For example, many type 2 slowly changing dimensions (SCD) do not easily form "SSAS natural hierarchies". If I have a salesperson that is moving from one organizational unit to another (or from one state to another), I still want to be able to drilldown from the organizational unit to the salesperson for a given period of time!!!
In this example, I should be able to use just two attributes to support the hierarchy. I should have a "salesperson" attribute and an "organizational unit" attribute associated with the saleperson. Then I should put the two in a hierarchy together with "organizational unit" on top. This should NOT perform worse than a cross join!
Most importantly, I should NOT be forced to create yet another (very esoteric) attribute called "salesperson at OU" (or whatever) as a work-around. This work-around is very costly when you do it for every type 2 SCD attribute that you ever want to use in a dimension hierarchy.
As you can tell, I do not yet have a strategy that works. I'm resigned to the existence of these performance problems in unnatural hierarchies. But now I'm just hoping for a KB article to which I can refer the clients of the cubes. This way I can just pass on the blame for the obscure performance problems in certain hierarchies. I would do so especially when the users are able to see for themselves that performance should be better (ie. they can stack the attributes individually and see that their user hierarchy performs much worse.)
Hope this makes sense,
You can simply avoid building hierarchies on unnatural hierarchies and teach the end users to drop attribute hierarchies on top of each other. It is clearly stated in the book I have referred to earlier that you should avoid building hierarchies on top of unnatural hierarchies.
I think you are discussing a new topic but still interesting with the relation between the sales person and an organizational unit. If you put these attributes together in a hierarchy they are not unnatural in the moment that you create them but since sales persons move between districts and customers over time you will create a problem with that appoach. This discussion is not about Analysis Services but is related to starschemas as a design approach and is mentioned by Kimball in the "Data Warehouse Toolkit"
If you put sales persons and their organizational relations in separate dimensions you will have no problem with the history of sales persons or organizational performance over time but you will miss drilldown that you have when you put these attributes in the same dimension.
I think that this is one of the hard design decisions that we will have to make in our daily work. What feature is more important than the others.
- Edited by Thomas IvarssonModerator Monday, January 18, 2010 6:24 AM
I just find it very surprising that the SSAS development team has been putting off the fixes needed to resolve these performance problems. Hierarchies built from type 2 SCD attributes are a major headache because they almost never turn out to be simple "SSAS natural hierarchies".
If anyone values performance in their hierarchies ("SSAS natural hierarchies" or otherwise), please vote!
I've been battling with this issue as well and have come up with the same solution as David when it comes to "naturalizing" a hierarchy.It really is a pain to create these fake attributes. I don't think splitting the attribute out to a seperate dimension is a viable solution because you then compromise the model that SSAS is supposed to be helping with. The other issue it creates is most of our clients are using Excel 2007 and when you drag on multiple hierachies to the row (even single level attribute hierarchies) then Excel automatically expands the second which isn't very helpful for the user.Before I found this post I asked a similar question on StackOverflow. No answers yet but I'll add the link in case some other answers eventuate, http://stackoverflow.com/questions/2203261/designing-dimension-hierarchies-natural-or-unnatural
We now have a formal KB article describing the difference in query performance between unnatural hierarchies and the corresponding (if not equivalent) cross joins.
The good news is that Microsoft is committing once again to fix this odd performance behavior within unnatural hierarchies. My users have some interesting hierarchies that I have advised them against. Hopefully it will not be too much longer before I can finally give them what they are asking for!