Monday, July 20, 2009 4:36 AM
I see some threads that claim ProcessUpdate will automatically rebuild indexes, but BOL and others say you need to ProcessIndex on partitions and/or dimensions after a ProcessUpdate. Does ProcessUpdate take care fo the indexes automatically?
SQL Server 2008 SP1 CU1 x64
Monday, July 20, 2009 5:49 AMAnswerer
When you specify Process Update , GOTO Change settings->Select "Process Affected Objects". This will make sure that the Indexes are build during Process update of the dimension.
By default, this option is not selected and hence you have to do a "Process Index" on the measuregroups which are linked to the dimensions you process updated.
Monday, July 20, 2009 7:19 AMAnswerer
If you use ProcessUpdate the flexible aggregations are dropped and they are not recreated automatically.
Below you can find a quote from the Analysis Services 2005 performance guide which lists the possibilities you can use to recreate the indexes.
An aggregation is flexible when one or more of the attributes participating in the aggregation have flexible direct or indirect relationships to the key attribute.
If you perform a ProcessUpdate on a dimension participating in flexible aggregations, whenever deletions or updates are detected for a given attribute, the aggregations for that attribute as well as any related attributes in the attribute chain are automatically dropped. The aggregations are not automatically recreated unless you perform one of the following tasks:
· Perform a ProcessFull on the cube, measure group, or partition. This is the standard ProcessFull operation that drops and re-creates the fact data and all aggregations in the partition.
· Perform a ProcessIndexes on the cube, measure group, or partition. By performing ProcessIndexes, you will only build whatever aggregations or indexes that need to be re-built.
· Configure Lazy Processing for the cube, measure group, or partition. If you configure Lazy Processing, the dropped aggregations are recalculated as a background task. While the flexible aggregations are being recalculated, users can continue to query the cube (without the benefit of the flexible aggregations). While the flexible aggregations are being recalculated, queries that would benefit from the flexible aggregations run slower because Analysis Services resolves these queries by scanning the fact data and then summarizing the data at query time. As the flexible aggregations are recalculated, they become available incrementally on a partition-by-partition basis. For a given cube, Lazy Processing is not enabled by default. You can configure it for a cube, measure group, or partition by changing the ProcessingMode property from Regular to LazyAggregations. To manage Lazy Processing, there are a series of server properties such as the LazyProcessing \ MaxObjectsInParallel setting, which controls the number of objects that can be lazy processed at a given time. By default it is set to 2. By increasing this number, you increase the number of objects processed in parallel; however, this also impacts query performance and should therefore be handled with care.
· Process affected objects. When you perform a ProcessUpdate on a dimension, you can choose whether or not to Process Affected Objects. If you select this option, you can trigger the update of dependent partitions within the same operation.
This posting is provided "AS IS" with no warranties, and confers no rights.
- Marked As Answer by Lee Cascio Monday, July 20, 2009 6:19 PM
Monday, July 20, 2009 10:19 PMThe confusing part is I see partitions being processed automatically after a dimension is processed. In addition I was readong Darren's blog @ http://www.tech-archive.net/Archive/SQL-Server/microsoft.public.sqlserver.olap/2006-11/msg00115.html where he mentioned the aggs and indexes are processed automatically.
Thursday, May 06, 2010 1:08 PM
I agree that this is totally confusing. There is no way to tell if ProcessIndexes is needed on partitions after ProcessUpdate's are performed on dimensions -- unless you are 100% familiar with the current dimension design and what types of internal relationships are currently being used.
In the very least, ProcessUpdate's should warn you about which cubes are being stripped of their indexes and aggregations.
I wonder, is there a SQL 2008 DMV that will give me information before and after "ProcessUpdate" so that I can see what indexes and aggregations are missing? I would be looking for the list of them with their current sizes.
Thursday, May 06, 2010 1:20 PMModerator
Actually, Darren wrote a great blog entry describing how to work out if you do need to do a Process Index here:
Blog: http://cwebbbi.spaces.live.com Consultancy: http://www.crossjoin.co.uk/
Thursday, May 06, 2010 6:04 PM
Thanks, that may do the trick. "Partition Stat" data has aggregations. Is there some separate DMV for indexes or does this "Partition Stat" data implicitly include indexes? (The task name, "ProcessIndex", seems to imply that it does something more than just build aggs and I don't want to miss doing that extra "something" - even if all aggs are built.)
Also, I'm curious if you know whether a "ProcessDefault" command on a cube or a database will rebuild the indexes and aggs if they aren't already. I can do some testing but it seems like kicking off a "ProcessDefault" on the entire database after updating dimensions would be an easy approach to use. (So long as there aren't additional considerations.)
Thanks in advance.
Monday, May 10, 2010 7:18 PM
I discovered the XMLA/DMV for indexes. It is called "DISCOVER_PARTITION_DIMENSION_STAT". It may indicate indexes that are missing, despite the fact that DISCOVER_PARTITION_STAT turns up nothing unusual.
We've had serious issues with "ProcessUpdate". Depending on the dimension and the cube, this command can sometimes degrade performance and sometimes cause no problems at all. When it degrades performance, it is sometimes because aggregations are dropped and sometimes it is because no aggs are dropped -- but indexes are! Very troublesome stuff. Hopefully these DMV's will at least provide some visibility into what is happening.
Monday, May 10, 2010 8:38 PMModerator
From what I understand, aggregations and indexes are either processed or unprocessed together and it's not possible for an aggregation to be processed and the related indexes not to be processed. I could be wrong though.
Have you set the Relationship Type property on your attribute relationships to Rigid where possible (ie if you don't expect any Type 2 changes to affect them)? As the excerpt from the AS 2005 that Orsi quoted above says, doing this then means that any aggregations that have completely Rigid relationships down to the key attribute on every dimension will not be dropped when you do a Process Update on a dimension (the BIDS Helper Aggregation Manager can tell you which aggregations are Rigid and which are Flexible: http://bidshelper.codeplex.com/wikipage?title=Aggregation%20Manager&referringTitle=Home). A Process Default should indeed rebuild any aggregations and indexes that have been unprocessed.
Blog: http://cwebbbi.spaces.live.com Consultancy: http://www.crossjoin.co.uk/
Wednesday, May 12, 2010 1:39 AM
It would make sense if aggregations and indexes shared the same processed state. It would certainly be easier to manage.
We currently use rigid relationships going up to all of our existing aggregations. That is why we were troubled by the fact that ProcessUpdate command would cause performance degradation in any of our cubes.
The specific case that has caused trouble for us in SSAS 2005 was with parent-child dimensions. It seems they have no distinctive aggregations that can be designed, except at the dimension key. But it turns out that they do have indexes.
It seems that in, SSAS 2005, these parent-child dimensions were handled in a special way -- they seemed to be optimized in cubes as if they had their own aggregations, built on flexible relationships. Whenever we did a ProcessUpdate on the parent-child dimensions, the related cubes would become incredibly slow. By querying DISCOVER_PARTITION_DIMENSION_STAT, we were able to determine that indexes were dropped by the ProcessUpdate command, even though all of the aggregations remained unchanged!!! I suppose this is a good reason not to overlook the additional DMV which is specific to indexes.
In SSAS 2008 there is a different behavior, however! It appears that a ProcessUpdate on a parent-child dimension will leave indexes in place! (Unfortunately I haven't been able to find any documentation on this behavior change from 2005 to 2008.)
Another thing I've found about SSAS 2008 is that we don't pay as high a price anymore for the option to Process Affected Objects. That option would have been an acceptable solution in SSAS 2005 if it wasn't so slow. In SSAS 2008, however, we can simply use it as a safeguard to make sure that performance isn't degraded in any objects related to the dimensions that are being updated. This is much easier than having to manually identify related objects and make sure they are reprocessed along with the dimensions in the same transaction.
Thanks for the feedback, David
Wednesday, May 12, 2010 4:58 PM
Here is a quick summary of what happens when you do ProcessUpdate:
1. After the dimension has been updated, the server analyzes the changes that occurred to the dimension. In 2005, this analysis was pretty simple and would often incorrectly detect that major changes had occurred that required clearing of indexes and aggregations. In 2008, this code was improved such that it more often would realize that nothing significant has occurred. It's a fairly small (but useful) optimization -- I guess nobody thought it was worth documenting!
2. Based on this analysis, the server will decide whether or not indexes and aggregations need to be cleared. If no (e.g. because records were only added and not deleted/updated), then the partitions won't be affected.
3. If indexes/aggregations need to be cleared, then the server will check if ProcessAffectedObjects was enabled -- if yes, then instead of clearing the indexes/aggregations it will rebuild the indexes/aggregations.
4. The act of clearing the indexes/aggregations also shows up as "partition processing operations" in Profiler -- that's one of the things that has been confusing some of you.
5. When aggregations are cleared, only the flexible aggregations need to be cleared because we're guaranteed by the rigid relationships that the members cannot have moved and therefore the rollups cannot have changed. However, indexes can still have changed and therefore you may still see the partition processing jobs kick off to clear the indexes.
6. ProcessIndexes and ProcessClearIndexes take care of building both bitmap indexes (aka map) and aggregations -- the context is that both aggregations and bitmap indexes are generically considered "indexes".
Really the main takeaway here is that if you ProcessUpdate a dimension, you should strongly consider either doing ProcessAffectedObjects or an explicit ProcessIndexes on the affected partitions so that bitmap indexes and flexible aggregations get rebuilt. The advantage of explicitly doing ProcessIndexes is that you can bring your cube online earlier and have the indexes/aggregations get processed more lazily in the background -- a number of customers prefer to do that because their processing windows are too small to wait for the indexes to get processed.
-- This posting is provided "AS IS" with no warranties, and confers no rights
- Proposed As Answer by Farhan H Soomro Tuesday, June 07, 2011 7:56 PM
Wednesday, May 12, 2010 11:39 PM
OK, on further review it appears that using a ProcessUpdate command on our parent-child hierarchies will sometimes leave the related indexes in place, and at other times the command will clear the related indexes. This is a change from SQL 2005 where the indexes were always cleared.
I'm not sure that I like the inconsistency and uncertainty that the ProcessUpdate command has acquired in SQL 2008. I guess I understand why it makes sense to always perform this command along with a ProcessAffectedObjects, or with a subsequent ProcessIndexes. These will ensure that performance is restored to a predictable and well-defined state. It appears that both of these options (ProcessAffectedObjects and ProcessIndexes) are smart enough to avoid doing more work than they must.
Thanks for all the help! I think I understand this update processing much better now. Since our upgrade to SQL 2008 we have gotten odd errors related to index re-processing in our partitions (see below.) I suppose that the changes made to the ProcessUpdate command may be the culprit. I think I will now be better equipped to troubleshoot these errors the next time they occur.
Errors in the OLAP storage engine:
An error occurred while processing the indexes,
File system error: The following error occurred
while opening the file '\\?\XXX.agg.rigid.map
Friday, May 14, 2010 2:56 PM
first thank you for explaining in detail how the process update of a dimension affects the indexes and aggregations. However, I have a quick question on your comment " The advantage of explicitly doing ProcessIndexes is that you can bring your cube online earlier and have the indexes/aggregations get processed more lazily in the background "
I don't understand how doing a ProcessIndex operation will have the indexes/aggregations get processed more lazily in the background. I thought that while doing a ProcessIndex it recreates the bitmaps and aggregations and once a partition is complete it is available for use rather than having to wait until all partitions have their indexes rebuilt. Is this what you mean by "lazily in the background"?
Friday, May 14, 2010 8:02 PM
All I was saying is that you can do ProcessUpdate without ProcessAffectedObjects. Since this would not rebuild the indexes, it would finish faster and the cube would come online quicker -- but with slow queries until you do the ProcessIndexes. And that's the scenario for needing an explicit ProcessIndexes capability -- otherwise, we could have just built ProcessAffectedObjects into ProcessUpdate as the only choice...
ProcessIndexes on a cube/measure group will only bring the indexes online once the transaction commits -- you can explicitly process indexes for each partition individually if you want them to come online as soon as each one is built, but that will incur the cost of multiple transaction commits. It's usually better to just do them in one shot -- you also get more unpredictable performance if the indexes come online randomly for different partitions.
There is another option which is to use lazy indexing -- a background operation where the server will automatically rebuild indexes for partitions (essentially an automatic ProcessIndexes). I'm personally not a huge fan of it -- I tend to prefer explicitly controlled building of indexes...
-- This posting is provided "AS IS" with no warranties, and confers no rights
Monday, May 17, 2010 9:22 AM
We also do not like to have the lazy indexing, because you have much less control when they are going to be "executed".
Our solution, is to have 2 servers:
- Staging Servers (x86) - where all processin is done. We developed an application that mimics the functionality of the "Parallel Process" tool for AS2000. With the Parallel Process 2005, you can process Dimensions in series, or process the Partitions in parallel (you can define how many to process in parallel). The Dimensions are process in series, because all most of our aggregations are flexible. Processing the Dimensions in parallel, in some case it would lock each other when "dropping" the aggregations/indexes in the affected partitions. The advantage of processing the Partitions in parallel is that each one is within its own transaction (some of our partitions take few hours to process). The Parallel Process 2005 uses the ascmd applications to do the processing. When we do a Process Full of a partition, we do Process Data in a transaction and then a Process Indexes in a second transaction, to avoid that an error in processing of the indexes, would "rollback" the Process Data. Again the transactions would be smaller and then the probability of running out of memory is smaller.
- End-user Servers (IA64) - after the OLAP databases are processed and reconciled in the Stating Servers the databases are synchronized to the End-user Servers.
I hope this explanation helps in someway,
PS: When processing the Dimensions in parallel, you just need to make sure that the first one is executed alone, since it will drop the aggregations/indexes. The other Dimensions can now be processed in parallel, wihtout locking each other. Neverthless, we never implemented this functionality, because the Dimensions tend to process much faster then processing a Partition.
Tuesday, May 18, 2010 8:00 PM
Kind of off thread but I notice you do synchronization. How is the performance on that? It is something we are considering.
Akshai - Thanks for following up!
Saturday, July 03, 2010 8:13 AM
We've used synchronisation on a previous project and it worked very well. After the initial synchronisation, only the changes are sent across the network with each subsequent synchronisation so it is very quick.
We are considering using it on the current project, but there are a number of reasons why it may not work so well:
- The size of the database is much larger which could be a problem when we need to do a full rebuild
- The percentage change in the database each day is much greater
- Data needs to be sent across the Atlantic rather than across London and on a slower network
I've been told by a Microsoft consultant that the fastest method is to use Robocopy as described in this article: http://sqlcat.com/technicalnotes/archive/2008/03/16/analysis-services-synchronization-best-practices.aspx
This involves copying the raw files after they have been processed. One disadvantage is that the target server is unavailable whilst files are being copied. Another is that it sounds a little scary to me, but I'm told it's safe and faster than the other methods.
Saturday, July 03, 2010 8:30 AM
This is an issue that's cropped up at work recently. I've been experimenting with the AW 2005 sample database by processing the Date dimension and seeing what happens to the Currency_Rates partition in the Exchange Rates measure group.
It seems that whatever I do the partition is always processed. I've changed the setting of ProcessAffectedObjects to both true and false and I've had the partition start from an unprocessed state and fully processed state. In each case the partition always gets processed. I had expected that with ProcessAffectedObjects set to false it would remain unchanged.
Am I missing something?
Wednesday, August 24, 2011 11:03 AM
Though it's a rather old thread, I would like to clarify one thing, as the thread is heavily linked from different places thanks to good explanation of consequences running ProcessUpdate on a dimension.
There may be indices built by SSAS which don't belong to any aggregation. I don't know whether one can somehow influence the indices built (probably not), but as such it is very well possible to have aggregations in place and (some) indices missing.
Probably the best way to check the state of indices and aggregations is querying the DISCOVER_PARTITION_DIMENSION_STAT (indices) and DISCOVER_PARTITION_STAT (aggregations) views.
Check also my thread for a SSAS 2008 R2 'bug': R2 ProcessIndex rebuilts all indices and aggregations, not just the missing ones (compared to SSAS 2008)