Note: Forums will be making significant UX changes to address key usability improvements surrounding search, discoverability and navigation. To learn more about these changes please visit the announcement which can be found HERE.
cube processing getting slower

إجابة مقترحة cube processing getting slower

  • mercredi 25 juillet 2012 18:50
     
     

    I currently have a cube that does simple calculations and aggregations. The cube pulls in about 5 years of fact table information (80 million rows) and has multiple dim tables supporting it. The cube contains 28 dim and two of them have about 15 attributes in them. The cube has also been partitioned by year.

    As we've added dim, the cube has started to processes slower and slower, but not the dim. We have added some dim that do not contain all the keys in the fact table (by design) but have set the properties on the dim to ignore when a key is not found ( not sure if thats the best way to approach that). Anyone have any experience with this or faced a similar situtation?

Toutes les réponses

  • mercredi 25 juillet 2012 22:50
     
     Réponse proposée

    Take a looking at the relational indexes. Every time a dimension is added that means one more join to the fact table. I'd confirm that the indexing is how it should be.

  • jeudi 26 juillet 2012 03:02
     
     

    I'd consider partitioning by month for 5 years of data in your fact table of 80 million rows.

    You could then process the latest partition on a nightly basis and not the whole fact table which would be much faster. You can dynamically process fact table partitions and dimensions via SSIS.

    Please find the following link; - http://social.msdn.microsoft.com/Forums/eu/sqlanalysisservices/thread/a3af5aaa-0ac0-45ec-b972-b9d7a2482ed6

    I would also partition your fact table in your data warehouse by month to match your cube.

    http://technet.microsoft.com/library/cc966457


    If you have found any of my posts helpful then please vote them as helpful. Kieran Patrick Wood MCTS BI,MCC, PGD SoftDev (Open), MBCS http://www.innovativebusinessintelligence.com/ http://uk.linkedin.com/in/kieranpatrickwood


  • mardi 31 juillet 2012 12:55
     
     
    i think I know a major reason for why its slowing down...I have 3-4 dims that only apply to a fraction of the data, generating alot of unknowmemebers (500,000+!). When I remove those dimensions, the cube processes very quickly then, 40min. Anyone know what the best way to handle those situations are?
    • Proposé comme réponse Yogish Bhat dimanche 12 août 2012 14:46
    • Non proposé comme réponse Yogish Bhat dimanche 12 août 2012 14:47
    •  
  • mardi 31 juillet 2012 13:03
     
     Réponse proposée

    Yes you could change the data source of the dimension / fact to a view and filter out the unknown members within the design of the view. For example filter out where FactSales->ProductKey IS NULL

    Alternatively you could create a dimension record which contains the unknown values. For example where DimProduct->ProductKey is NULL then set DimProduct->ProductKey to -1. So where the facts are associated with  an unknown dimension value e.g. FactSales->ProductKey is NULL then set FactSales->ProductKey to -1.

    Either of the above approaches should improve the usability and / or performance of your cube.


    If you have found any of my posts helpful then please vote them as helpful. Kieran Patrick Wood MCTS BI,MCC, PGD SoftDev (Open), MBCS http://www.innovativebusinessintelligence.com/ http://uk.linkedin.com/in/kieranpatrickwood

  • mardi 31 juillet 2012 14:15
     
     

    Hi Kieran,

    I'm sort of confused on that. Lets say I have a fact table that contains 50 product keys (for example) and I have a dim table that contains lets say 5(1-5) of those product keys since it only applys to that segment of the data. Do I just create one record with a product key of -1 and every attribute in the dim to lets say "Other"? How does SSAS pick up the -1 key in the dim table for keys 6-50 in the fact table and mark those facts unknown.

  • samedi 4 août 2012 22:27
    Modérateur
     
      A du code

    It sounds suspicious that the performance killer is the discarded records that occur when the foreign key in the fact table isn't found in your dimension. You might test performance of ProcessFull on the database with the following error configuration which doesn't log the discarded rows:

    <Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
      <ErrorConfiguration xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200">
    	<KeyErrorLimit>-1</KeyErrorLimit>
    	<KeyErrorAction>DiscardRecord</KeyErrorAction>
    	<KeyNotFound>IgnoreError</KeyNotFound>
    	<NullKeyNotAllowed>IgnoreError</NullKeyNotAllowed>
      </ErrorConfiguration>
      <Parallel>
        <Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200">
          <Object>
            <DatabaseID>YourDatabaseID</DatabaseID>
          </Object>
          <Type>ProcessFull</Type>
        </Process>
      </Parallel>
    </Batch>

    I do agree with the other comments about changing your fact table in the ETL to have a -1 ProductKey any time you know that ProductKey doesn't exist. Or if you really don't want to see those rows in the cube, processing may perform better if you can filter out those rows in SQL rather than letting SSAS filter out those rows and discard them. It's worth testing. But the above XMLA with the ErrorConfiguration may be enough to improve performance of processing. Let us know how it goes.


    http://artisconsulting.com/Blogs/GregGalloway

  • mardi 7 août 2012 20:06
     
     

    Hi Furmangg,

    thank you for the response. I'm not really sure how to implement the xml code as I am not familiar with xml or the cube xml code. Can I do this in visual studio in design view?

    Also, I still don't understand how adding a record to the dim table with a key of -1 solves that issue...could someone elaborate on that?

    Thanks!

  • mardi 7 août 2012 21:29
    Modérateur
     
     

    Sorry I wasn't clear about that. The XML above is an XMLA script. If you open Management Studio, connect Object Explorer to Analysis Services, right click on your database and choose Process, choose ProcessFull, then click the Script button instead of OK, it will open a new XMLA window. Run that script and note the performance. Now note the DatabaseID from that XMLA script and paste it into my XMLA script. Compare the performance of my XMLA script which doesn't log discarded rows.

    Regarding the -1 ProductKey row, you would have to add that "unknown" row to DimProduct. Then you would have to have to change your ETL. If any fact rows don't match any DimProduct row, then assign those fact rows to -1 ProductKey. That way, all fact rows will tie out an an existing DimProduct row. Hope that makes sense.


    http://artisconsulting.com/Blogs/GregGalloway

  • mercredi 8 août 2012 13:59
     
     
    when you say ETL, i'm assuming you mean in the dsv?
  • mercredi 8 août 2012 19:34
    Modérateur
     
     

    No, I was referring to the load process for your data mart. ETL means Extract-Transform-Load. Do you have stored procs or SSIS packages which load your fact tables, look up dimension keys, etc? I'm suggesting changing the code that loads your data mart.

    If this is a cube on top of a transactional database, you could make those changes in the DSV or in the database itself.


    http://artisconsulting.com/Blogs/GregGalloway

  • vendredi 10 août 2012 19:03
     
     
    yep..the cube is on top a transactional db...I don't think I'll be able to change the ETL process as those tables aren't managed by me....how would I do this in DSV?
  • vendredi 10 août 2012 20:44
    Modérateur
     
     

    I would change the fact table to a Named Query (if it's not already) and do a left outer join to the dimension. Something like:

    select isnull(f.ProductKey,-1) as ProductKey
    ,f.Measure1
    from FactTable f
    left join DimProduct p on p.ProductCode = f.ProductCode


    http://artisconsulting.com/Blogs/GregGalloway

  • dimanche 12 août 2012 15:01
     
     

    You are getting  unknowmemebers  because the same is configured in the dimension's error configuration.

    If you have unknowmemebers (500,000+!), then it is data issue. What I would do in this case is I would go to the table and add

    foreign key constraint. This would prevent those unknown data coming to the cube.

  • samedi 25 août 2012 16:23
     
     Réponse proposée

    Hi k1ng87,

    Sorry I didn't get back to you earlier.

    To answer the question you raised earlier. To deal with the unknowns at the ETL stage e.g. in the datawarehouse staging area, i.e. before they reach the cube. Have a record within the dimension which records the unknowns as DimMyDim.DimKey = -1 and DimMyDim.DimDescription as 'unknown'. The facts will have a foreign key relationship with this dimension as FactMyFact.DimKey = -1. It is likely that you may need to write an UPDATE SQL statement to update the fact record values with an unknown dimension value to -1.

    I find that handling 'unknowns' within the staging area or the data warehouse before the data reaches the cube is simplier in the long run and increases the integrity of the cube and potientially also increases performance.

    Related links; - http://stackoverflow.com/questions/977924/handling-nulls-in-datawarehouse

    http://www.sqlmag.com/article/data-management/null-in-the-data-warehouse


    If you have found any of my posts helpful then please vote them as helpful. Kieran Patrick Wood MCTS BI,MCC, PGD SoftDev (Open), MBCS http://www.innovativebusinessintelligence.com/ http://uk.linkedin.com/in/kieranpatrickwood