none
Encountering deadlocks with Proactive Caching

    Question

  • I have a couple of cubes that I'm wanting to designate for real-time ROLAP storage, so I've set up the cubes, and each of their dimensions, to use Proactive Caching.  The cubes are defined to use a Scheduled Polling query notification scheme, and the dimensions are set to use SQL Server notification.

    According to perfmon (the "Total Deadlocks Detected" counter), this setup is somehow resulting in numerous deadlocks.  I tried disabling proactive caching for all but two dimensions, but still encountered this problem.  Has anyone else encountered this, or have any information as to why this configuration is resulting in deadlocks? 

    The server is running SQL Server 2005, SP2.  My workstation (on which I can reproduce this issue) also has a dual core processor. 

    Thanks,
    Joe Miller

    Saturday, March 24, 2007 6:00 PM

All replies

  • Have you tried running SQL Profiler against the SQL server in order to diagnose the deadlocks? I believe profiler should be able to show you the two statements that are causing the deadlocks. If it is you proactive caching queries locking up with the standard ROLAP queries, then looking at your indexing setup or adding hints to your polling query may help.

    One possible solution would be to use the snapshot isolation mode, but you would really want to understand what is causing the deadlocks and consider the other side effects of snapshot isolation before going down this path.

    Monday, March 26, 2007 2:50 AM
    Moderator
  • Thanks for the response, Darren.  I appreciate any help you can offer.

    I had run Profiler against the OLAP database, which helped me identify the objects affected by the deadlocks (e.g., the designated victim), but the deadlock graph doesn't provide any information about the queries that caused them.  I also tried qualifying the queries with nolock hints, but to no avail.

    The deadlocks occur even in the following scenario, which I particularly find odd:

    1. Set up proactive caching for a dimension (or a partition), for real-time ROLAP.
    2. Select scheduled polling for the notification type, using "select getdate()" as the polling query.

    (My goal with the above polling query is basically to ensure the cache is cleared out with each polling.  We would like to disable SQL Server's caching mechanism completely for these particular cubes, but that doesn't appear to be an option...)

    I'm not really familiar with the new snapshot isolation level, but from the description I read on Books Online I'm not sure it would help in this situation.  It seems like the deadlocking is occuring on the OLAP side of things, rather than on the relational side.  (For example: when the deadlocking starts up en masse, the OLAP database becomes unresponsive - or *very* slow - but the relational DB answers queries just fine.)

    Thanks again for your help,
    Joe Miller

    Monday, March 26, 2007 5:55 AM
  • The OLAP side of things (SSAS) does not have the concept of deadlocking, so it is impossible that the deadlocks are coming from SSAS, they must be coming from the relational system. When you are using ROLAP storage the OLAP engine is pretty much only a semantic layer (with a bit of caching) it translates MDX queries into SQL queries and executes them against the relational source and then massages the results into a multi dimensional rowset.

     

    I am guessing that the OLAP queries are unresponsive as they are probably being locked up by the deadlocks and as the queries are only reading it is most likely always being chosen as the deadlock victim and having to re-try.

     

    I'm assuming that your relational source is SQL Server 2005 because of your reference to deadlock graphs? If you export out the deadlock graph to a file you should be able to open it up and see atleast part of the statements that triggered the deadlock (see http://www.extremeexperts.com/SQL/Yukon/DeadLockDetection.aspx)

     

    Because you are after real time ROLAP I am assuming that the issue is with the fact that you have data that is being constantly updated and constantly queried. The locking semantics in SQL Server mean that readers can block writers which means if your readers are doing joins against multiple tables and your writers are updating mulitple tables in one transaction. Snapshot Isolation might help because it will stop "readers" and "writers" blocking each other, effectively readers get to see a snapshot version of the row as it was when the query started. But we really need to determine if this is actually the case before going down that path. It could also be that a slight change in the way your updates are applied could resolve the issue too.

     

     

    Tuesday, March 27, 2007 1:05 AM
    Moderator
  • Thanks again for the response, Darren.

     

    I'm a little confused as to what it is I'm seeing if SSAS doesn't have any concept of deadlocking.  The deadlock graphs that I am getting are being retrieved from Profiler running against SSAS.  The instance of Profiler that I have running against the relational database (simultaneously) actually doesn't log any deadlock information at all, in these situations.

     

    And you are correct with your assumption about our relational DB (SQL Server 2005 SP2).  The deadlock graph retrieved from the SSAS Profiler, though, doesn't contain nearly as much useful information as the ones retrieved from a relational DB (e.g., like the one shown in the link you provided).  The XML just consists of a list of Lock elements, each of which represents a lock involved in the deadlock, including information like Lock ID, Lock Status, Lock Type, etc.

     

    From the data I'm getting from these traces, it really looks like SSAS is (in a sense) tripping over itself when rebuilding the cache(s).  If you're certain that the deadlocking isn't occurring on the SSAS side, I'll believe you - but it makes this issue even more bizarre.    

     

     

    Thanks,

    Joe Miller

     

     

    Tuesday, March 27, 2007 10:03 PM
  • I think that what you are seeing in the SSAS profiler is the error from SQL Server "bubbling" up. I have not had to do any realtime ROLAP in a production environment, so this is all just theory on my behalf, but I am pretty sure that this is what is happening. 

     

    When you trace the Relational source are you filtering on a particular Database? There is a bug in the deadlock event where the database Id column is not populated, thus if your trace is filtered to a particular database you will not see any deadlock events. What people are having to do as a work around is to run a separate trace just catching all deadlock events. Try this and see if you have any luck.

    Wednesday, March 28, 2007 5:30 AM
    Moderator
  • That would make sense (the SSAS Profiler instance catching a "bubbled-up" version of a deadlock error), but since I'm not seeing anything on the relational Profiler I'm not sure that this is what's happening.  Unfortunately, I'm not running any filters on the Profiler instances, so the bug you described isn't affecting our results here.

     

    I'll continue re-running these traces to see if anything else crops up... but so far I'm still not seeing anything problematic on the relational DB side.

     

    Thanks again for your help,

    Joe Miller

    Wednesday, March 28, 2007 4:44 PM
  • Unfortunately there does not appear to be a lot of people using ROLAP storage so I don't know if you will be able to get much more help of this forum. Maybe there could be some locking at the SSAS layer, but I was pretty certain that the ROLAP storage delegated all of that to the underlying relational database. 

    This is sounding like it might be something where you might need to put in a call with Product Support. I would be interested to hear back when/if (hopefully when) you find a resolution.

    Thursday, March 29, 2007 1:22 AM
    Moderator
  • Well, I appreciate your help in any case, Darren.  I'll be sure to post back here once I know more.

     

     

    Thanks,

    Joe Miller

    Thursday, March 29, 2007 3:53 PM
  • Analysis Services most certainly has the concept of deadlocks. There is a special profiler event just for that.

    I am seeing deadlocks with ROLAP when the proactive caching notifications are interleaved. Meaning if one dimension gets notified to refresh, and in a few milliseconds another dimension gets notified to be refreshed, those are two separate commands and may interleave their locks to cause a circle.

    The profiler trace of Analysis Services with the deadlock even will show the Analysis Services lock that are held by each SPID and show the interleaved pattern causing a circle of locks.

    One or more of the AS jobs will be killed and the proactive caching may never retry that notification, so you may have stale data at the point when a deadlock occurs between multiple proactive caching notifications events (ROLAP processing, MOLAP processing, etc).


    Didn't get enough help here? Submit a case with the Microsoft Customer Support team for deeper investigation - http://support.microsoft.com/select/default.aspx?target=assistance

    Thursday, September 5, 2013 4:20 PM
  • Jason, thanks for useful answer.

    I've recently come across with this issue. In my case I've built MOLAP by the direct source i.e dimensions forms by using querying on the fact table with distinct clause. I'm assume that it can be cause of occurrence deadlocks.

    Can you explain is there exists general way to eliminate that deadlocks?

    Is there has any of parameters which allow to control timeout before one of transactions will be killed?

    Thursday, September 12, 2013 7:24 PM
  • Do you seeing deadlocks in Analysis Services, or deadlocks in SQL Database Engine? What is the error exactly?

    Are you running a processing command in AS when the deadlock occurs?

    Are you running more than one processing command, or just one? What is the command?

    If more than one object, are you putting them in the same batch, to run in parallel or running them sequentially? or in separate connections?

    Using the word Distinct doesn't change the locks held in Analysis Services as far as I know. I imagine the query plan is different in the SQL Database Engine when doing a TSQL Select query with DISTINCT, since it might internally sort the data to look for duplicates, however I still don't imagine SELECTs blocking each other, or deadlocking in Read Committed Isolation level (as set on the connection / Data Source). Repeatable Read or higher isolation level may aggravate deadlocks. If the Database Engine query goes very parallel for its execution plans, Multiple threads for the same query can deadlock each other sometimes. Reducing the "Max Degree of Parallelism" property to a lower value (affects that whole SQL instance) can minimize inter-thread deadlock possibilities.

    In the XMLA code, you can batch multiple objects to be processed in one XMLA command. Also there is a tag for parallel processing. You can remove the tag and it will process the list of objects sequentially. If you keep the parallel tag, there is a property within the Parallel tag which says MaxParallel. Without the max parallel property, the AS engine determines how many processing jobs to run in parallel.

    • No Parallel tag - sequential one object at a time.
    • <Parallel> - let the server deice how many parallel processing jobs
    • <Parallel MaxParallel="8"> - control how many parallel processing jobs

    This example would run both processing attempts for the 2 objects in parallel within the same transaction scope, so it shouldn't deadlock itself within AS.

    If the dimensions Data Source View designs referenced share the same underlying SQL tables in SQL Engine, they could deadlock in there potentially. However Since SELECT statements rarely deadlock in default Read Committed Isolation level its unlikely that processing causes deadlocks in SQL Engine for straight designs. Other queries running Updates, Deletes, Inserts could somehow conflict and cause SQL Database Engine deadlocks, but those are likely not part of Analysis Services processing work, since it does SELECT queries during processing.

    <Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
    <Parallel>
        <Process ... edited for brevity
     <Object>
      <DatabaseID>DBName</DatabaseID>
      <DimensionID>Dim 1 Name</DimensionID>
          </Object>
          <Type>ProcessFull</Type>
          <WriteBackTableCreation>UseExisting</WriteBackTableCreation>
        </Process>
        <Process ... edited for brevity
     <Object>
      <DatabaseID>DBName</DatabaseID>
      <DimensionID>Dim 2 Name</DimensionID>
          </Object>
          <Type>ProcessFull</Type>
          <WriteBackTableCreation>UseExisting</WriteBackTableCreation>
        </Process>
      </Parallel>
    </Batch>

    Deadlocked threads will hang until a victim is chosen and cancelled/killed, so making them die later than sooner will not really affect the outcome. The timing of the original actions before locks are held influences when the locks are obtained, and when more locks are attempted to be obtained to create the deadlock condition.  Timing of the events is important, but not changing the timing of the timeout or deadlock monitor usually.

    Sorry I couldn't answer the direct question without more information. Thanks, Jason


    Didn't get enough help here? Submit a case with the Microsoft Customer Support team for deeper investigation - http://support.microsoft.com/select/default.aspx?target=assistance

    Thursday, September 12, 2013 8:22 PM