locked
Understanding ROLAP RRS feed

  • Question

  • We are looking at adding a real-time aspect to our SSAS cube set, to allow reporting against a relatively small but constantly changing data Our initial thought was to use ROLAP partitions and dimensions - we were under the impression that the queries would always be satisfied from the underlying relational  tables. However, in our explorations we are finding that understanding to be incorrect - SSAS is in fact caching data, so it ends up returning stale data.

    We've configured our paritions and dimensions for ROLAP and disabled all proactive caching (we would like this solution to work with SSAS Standard Edition, which precludes the use of proactive caching). Is there something else we need to figure? Are are we headed down the wrong path here?

    Tuesday, January 16, 2007 1:48 AM

Answers

  • The data is indeed cached by Analysis Services, both in ROLAP and in MOLAP modes. The way to get non-stale data is to enable proactive caching feature, which will listen to the updates and know when the cache becomes stale. So by disabling this mode - you effectively chosen not to work in real-time mode. You are right, that you will have to use Enterprise Edition of real-time access to data is in your requirement set.
    Tuesday, January 16, 2007 6:00 AM
  • Of course, now that Mosha mentions it, the fact that the data is stored in a relational store does not disable the caching. There is an XMLA command to clear the cache, I emailed a sample to Jamie Thomson and he put it on his blog here http://blogs.conchango.com/jamiethomson/archive/2006/07/07/4195.aspx

    You might be better moving the calculation up to the cube, it depends on the granularity of the calc, but you are correct in that there would be no updates to trigger a cache refresh from a calculated column. A calculated column based on the current system date/time, would also invalidate the entire data set. If the calc was in the cube, the storage engine cache would remain valid and I think the formula engine cache would probably detect that the calculation was non-deterministic and not cache the sort of calculation we are talking about.

    If you are after a real time (or near real time) system it sounds like you will need the enterprise edition. There are a lot of settings that you can configure with pro-active caching in order to control cache thrashing. There are some good web casts on pro-active caching which run through these settings. It is possible with pro-active caching to have a MOLAP based cube out-perform SQL queries. I saw this demonstrated at Tech Ed 2005 and did the same presentation for my local SQL User Group.

    The following is one of the ideal scenarios for proactive caching and incremental cube processing.

    If you had a big SQL based real time report that takes 1 minute to run and your cube takes 20 seconds to update and 10 seconds to run the same report - then you can start your SQL based report, 30 seconds later you start your OLAP based report, both reports finish at the same time, but the data in the OLAP report can be 30 seconds "fresher".

    I believe that the sweet spot for pro-active caching is where you have source data that is appended to (not updated) and the users are willing to accept a certain degree of latency. A limited amount of latency allows for a reduction of cache thrashing, and reduces the load on the relational source due to constant re-processing. Combining pro-active caching with date based partitioning can also assist in shorting the processing load.

    Tuesday, January 16, 2007 9:55 PM

All replies

  • Can I ask how you have come to the conclusion that there is caching involved? Are you sure you are not just seeing some latency involved in generating the relational query, executing it and returning the results? I have not had to implement a ROLAP based system in practice so I am interested in what your experiences have been.

    What is the relational source? If it is SQL Server - have you run profiler against it to confirm that every OLAP query is not being translated into a relational query? At the very least I think you should be able to run profiler against SSAS and see the relational queries it is firing off.

    I hope all these questions don't sound too aggressive - I am just genuinely interested in understanding your issue.

    Tuesday, January 16, 2007 5:11 AM
  • The data is indeed cached by Analysis Services, both in ROLAP and in MOLAP modes. The way to get non-stale data is to enable proactive caching feature, which will listen to the updates and know when the cache becomes stale. So by disabling this mode - you effectively chosen not to work in real-time mode. You are right, that you will have to use Enterprise Edition of real-time access to data is in your requirement set.
    Tuesday, January 16, 2007 6:00 AM
  • Wow, that's really unfortunate.

    Is there some way we can request that SSAS periodically dumps its cache, so that for SE users we can at least do "short-time", if not "real-time", analysis? For example, does processing the cube/dimensions cause the cache to get dumped? Or is there an XML/A command that we can use to dump the cache?
    Tuesday, January 16, 2007 3:26 PM
  • Another couple of follow up questions.

    Our original design had the fact table containing a couple of calculated columns whose value would change at each query time (specifically, they are computing the time difference from "now" to a fixed time value stored in another column). It seems like, even with proactive caching, this wouldn't work, because there's no "updates" to the database to trigger a cache refresh. Am I correct about that? If so, it doesn't seem like there would be a way to make SSAS behave correctly with that type of data (although perhaps we can move the time calculation up to the cube instead of doing it in the relational database).

    Also, or a truly constantly-changing data set, it seems like caching in SSAS could actually slow things down - the cache will essentially always be stale, and notifications from the database would be continuous. Would we need to worry about cache thrashing, or any other performance issues in this case?


    Tuesday, January 16, 2007 5:50 PM
  • Of course, now that Mosha mentions it, the fact that the data is stored in a relational store does not disable the caching. There is an XMLA command to clear the cache, I emailed a sample to Jamie Thomson and he put it on his blog here http://blogs.conchango.com/jamiethomson/archive/2006/07/07/4195.aspx

    You might be better moving the calculation up to the cube, it depends on the granularity of the calc, but you are correct in that there would be no updates to trigger a cache refresh from a calculated column. A calculated column based on the current system date/time, would also invalidate the entire data set. If the calc was in the cube, the storage engine cache would remain valid and I think the formula engine cache would probably detect that the calculation was non-deterministic and not cache the sort of calculation we are talking about.

    If you are after a real time (or near real time) system it sounds like you will need the enterprise edition. There are a lot of settings that you can configure with pro-active caching in order to control cache thrashing. There are some good web casts on pro-active caching which run through these settings. It is possible with pro-active caching to have a MOLAP based cube out-perform SQL queries. I saw this demonstrated at Tech Ed 2005 and did the same presentation for my local SQL User Group.

    The following is one of the ideal scenarios for proactive caching and incremental cube processing.

    If you had a big SQL based real time report that takes 1 minute to run and your cube takes 20 seconds to update and 10 seconds to run the same report - then you can start your SQL based report, 30 seconds later you start your OLAP based report, both reports finish at the same time, but the data in the OLAP report can be 30 seconds "fresher".

    I believe that the sweet spot for pro-active caching is where you have source data that is appended to (not updated) and the users are willing to accept a certain degree of latency. A limited amount of latency allows for a reduction of cache thrashing, and reduces the load on the relational source due to constant re-processing. Combining pro-active caching with date based partitioning can also assist in shorting the processing load.

    Tuesday, January 16, 2007 9:55 PM
  • Thanks Darren. I'd actually stumbled across that blog post in my Googling.

    The frustrating thing here is that we're going to have to involve a whole lot of extra machinery in SSAS to get what we need, when in fact what we really want is for SSAS to do LESS. If we could turn off the cache for our ROLAP cube and dimensions (which will generally remain small,  <  1 million rows in the fact table), that would actually be a better solution for us. *Sigh*.


    Wednesday, January 17, 2007 7:41 PM
  • I just realized something. One of the calculated columns in our schema actually acts as a dynamic dimension member selector (a dynamically calculated dimension key, essentially). Is there a way to do that at the cube level?
    Wednesday, January 17, 2007 11:23 PM
  • >You might be better moving the calculation up to the cube

    Hmm, now that I think about this some more, I'm not sure how I would perform the calculation at the cube level. There's a fact table column that contains a timestamp. How would I use that at the cube level to compute a DateDiff? This would essentially be a leaf level calculation, it seems to me. Any advice?
    Wednesday, January 24, 2007 12:09 AM
  • You may need to explain a bit more about your time dimension(s) and how this dynamic dimension key was working.

    I don't know if you are set up this way or not, but since we are talking about a real time system I'm assuming that you might have some sort of "event window" dimension. That is a dimension with members that relate to events that occurred within a specific window of time (eg.  "last 10 minutes", or "2-3 hours ago", etc)

    If this is the case you might be able to use custom member formulas, or possibly even imlement the members of this dimension as calculated members with an expression something like the following.

    Create member CurrentCube.EventWindow.Last10Minutes as AGGREGATE(
               StrToMember
    (Format(DateDiff(mm,-10,Now()),"hh:mm"))
               :
    StrToMember
    (Format(Now(),"hh:mm"))
    )

    It's possible that if you stay with a ROLAP solution and disable caching, that staying with a dynamic dimension key in a SQL view may work for you. Resolving the join dynamically at runtime in the relational engine may be faster than returning all the data to the OLAP engine and then having it run further calculations. I don't really know which approach would be better, you may have to do some prototyping to find out.

    Wednesday, January 24, 2007 2:51 AM
  • Sorry, I forgot to put in the dimension reference, the code above should read something like...

    Create member CurrentCube.EventWindow.Last10Minutes as AGGREGATE(
               StrToMember
    ("[Time].[Time].[" & Format(DateDiff(mm,-10,Now()),"hh:mm") &  "]")
               :
    StrToMember
    ("[Time].[Time].[" & Format(Now(),"hh:mm") & "]")
    )

    Wednesday, January 24, 2007 4:20 AM
  • Sorry, I didn't really explain the details of what I was trying to do, did I? There are actually two calculations that we are interesting in.

    The first is a calculated measure which computes a time duration (as a simple float) between "Now" and a time in the past, where the time comes from a column in the fact table. This allows us to perform average duration calculations by aggregating the total duration for all the records and dividing by the number of records. Doing this as a computed column in the database is easy - I'm unclear on how to do it in the cube.

    The second is a dimension that represents a bucketized version of the above time duration. The dimension contains levels for the minute (5 minute granularity), hour, day, etc. This is useful for efficiently calculating the number of items with a certain duration
     - you can construct a range set to slice on items with durations between 0 and 1 hours, for example.

    The measure calculation is actually the more important of the two, since we can use a Filter to derive the same results as using the dimension (less efficiently, of course).

    >
    It's possible that if you stay with a ROLAP solution and disable
    > caching,

    My takeaway from Mosha's comment was that disabling caching isn't possible. Did I misundertand that?
    Wednesday, January 24, 2007 5:57 PM
  • >>It's possible that if you stay with a ROLAP solution and disable
    >> caching,
    >
    >My takeaway from Mosha's comment was that disabling caching isn't possible. Did I misundertand that?

    Well at the time I think we were talking in the context of the Standard Edition - in which case Mosha's comment still stands. I think this really reduces ROLAP's usefulness in the standard edition, but thats just the way it is.

    I then confused the issue by talking about proactively caching a MOLAP partition, but the configuration I am talking about above is using proactive caching with a ROLAP partition.

    I was just reading back over this thread and realised that I dropped this in without explaining it properly. If you use the Enterprise Edition and set up your partition with ROLAP storage you can then enable proactive caching and set the latency to 0 - effectively turning off the caching and forcing SSAS to always query the relational source. I can't remember if the proactive caching dialog box still asks you for a notification method as it is redundant with a ROLAP partition, but if it does I would just choose something like client initiated notifications.

    Cheers
    Darren

    Wednesday, January 31, 2007 6:47 AM
  • >set the latency to 0 - effectively turning off the caching and forcing SSAS
    >to always query the relational source.

    Hmm, are you positive of that? The way I read the documentation (which is not at all clear on the topic, so I could be interpreting incorrectly), ROLAP with PC still relies on notifications to clear the cache. And the section on real-time ROLAP in the Melomed (et al) book states "In the real-time ROLAP scenario, SSAS drops ROLAP caches when it receives a change notification because the Latency property is set to 0". Again, in the case of the calculated columns that I mentioned, there is no change notification sent in our scenario.

    Our next test is going to be setting up a polling query that always returned a different value, guaranteeing that PC would always think the data has changed. Then we can use the polling frequency to fine-tune the performance vs real-timed-ness of the system. Seems a bit hacky, but it's the best alternative I've come up with.


    Wednesday, January 31, 2007 6:50 PM
  • Kevin - your interpretation is correct. Latency of 0 doesn't turn off data caching in SSAS.
    Wednesday, January 31, 2007 6:59 PM
  • Thanks for the confirmation, Mosha.
    Wednesday, January 31, 2007 7:49 PM
  • Sorry - I got that book last week, but it's back home in Australia and I am over in the US at the moment. Books on line actually says similar things about the latency property being related to the notifications - I guess I got a bit over excited when I thought I may have figured out a work around for you

    I had the same hack about the polling query in mind too. You could probably fool the other notifcation methods also, but I think you would need to introduce other some other scheduling mechanism which would create more "moving parts" and make those solutions more fragile.

    Thursday, February 1, 2007 1:18 AM