locked
Performance issues with 50 million rows of sales data! RRS feed

  • Question

  • We are planning to use SQL Server 2008R2, Analyses Services and Reporting Services. There is 50 million rows of data  We are still in design phase.

    What are methods to optimize perfomance of displaying data?


    Kenny_I

    Monday, February 11, 2013 11:03 AM

Answers

  • You need to follow the best practices in your design to improve the performance. Design of dimensions, partition and aggregation plays key role in deciding the cube performance. It is too early to comment on anything in your case, follow the best practices,

    http://technet.microsoft.com/en-us/library/cc966399.aspx

    • Marked as answer by Eileen Zhao Monday, February 25, 2013 3:06 AM
    Monday, February 11, 2013 4:49 PM
  • Hi Kenny,

    It rather depends on the profile of that 50 million rows of source data. Is every row going to create a tuple, or are some going to be consolidated and summarised (somewhat the point of SSAS).

    What are your parameters here in terms of requirements:

    • How many dimensions intersect the data?
    • Is the data ongoing and to be updated frequently, or just a one off load for analysis.
    • What's the load like on the query side, lots of people running well known queries, mostly ad-hoc queries etc?
    • How quick do you need to load the source data?
    • What's the latency requirements of the cube - For ongoing data, how soon does it need to be in the cube, update every 5 minutes or overnight?
    • What's your mechanism for loading the data in, are you thinking SSIS or some other method?

    I have several years experience of loading very large source data sets into SSAS.

    - Dan

    • Marked as answer by Eileen Zhao Monday, February 25, 2013 3:06 AM
    Tuesday, February 12, 2013 9:12 AM
  • Hi  Kenny,

    As Yogish says, it’s too early to comment on anything in your case, firstly you should go through the best  Practices Design Articles /whitepapers .

    Mainly if you have a huge DW, you can think of creating partition on multiple fact tables  and process a cube with multiple partitions and design the best effective aggregation.

    Break that MG into as many partitions as possible. And partition the underlying table into as many partitions as possible. So those MGs partitions will be hitting DIFFERENT table partitions at the same time. which will surely help in performance and optimizations.

    After that you can think of designing best effective aggregation.

    Here are some most popular whitepaper and article you should follow.

    http://msdn.microsoft.com/en-us/library/hh226085.aspx

    http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/10/10/analysis-services-2008-r2-performance-guide.aspx

    http://sqlcat.com/sqlcat/b/top10lists/archive/2007/09/13/analysis-services-query-performance-top-10-best-practices.aspx

    http://social.msdn.microsoft.com/Forums/en-US/sqlanalysisservices/thread/f98b6aa5-1f1a-4823-b46b-be3b006bbcc1

    Hope this will help you somewhat and I will be glad if I can help you more.

    Thanks,

    Anil Maharjan

    • Marked as answer by Eileen Zhao Monday, February 25, 2013 3:06 AM
    Tuesday, February 12, 2013 12:41 PM

All replies

  • You need to follow the best practices in your design to improve the performance. Design of dimensions, partition and aggregation plays key role in deciding the cube performance. It is too early to comment on anything in your case, follow the best practices,

    http://technet.microsoft.com/en-us/library/cc966399.aspx

    • Marked as answer by Eileen Zhao Monday, February 25, 2013 3:06 AM
    Monday, February 11, 2013 4:49 PM
  • Hi Kenny,

    It rather depends on the profile of that 50 million rows of source data. Is every row going to create a tuple, or are some going to be consolidated and summarised (somewhat the point of SSAS).

    What are your parameters here in terms of requirements:

    • How many dimensions intersect the data?
    • Is the data ongoing and to be updated frequently, or just a one off load for analysis.
    • What's the load like on the query side, lots of people running well known queries, mostly ad-hoc queries etc?
    • How quick do you need to load the source data?
    • What's the latency requirements of the cube - For ongoing data, how soon does it need to be in the cube, update every 5 minutes or overnight?
    • What's your mechanism for loading the data in, are you thinking SSIS or some other method?

    I have several years experience of loading very large source data sets into SSAS.

    - Dan

    • Marked as answer by Eileen Zhao Monday, February 25, 2013 3:06 AM
    Tuesday, February 12, 2013 9:12 AM
  • Hi  Kenny,

    As Yogish says, it’s too early to comment on anything in your case, firstly you should go through the best  Practices Design Articles /whitepapers .

    Mainly if you have a huge DW, you can think of creating partition on multiple fact tables  and process a cube with multiple partitions and design the best effective aggregation.

    Break that MG into as many partitions as possible. And partition the underlying table into as many partitions as possible. So those MGs partitions will be hitting DIFFERENT table partitions at the same time. which will surely help in performance and optimizations.

    After that you can think of designing best effective aggregation.

    Here are some most popular whitepaper and article you should follow.

    http://msdn.microsoft.com/en-us/library/hh226085.aspx

    http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/10/10/analysis-services-2008-r2-performance-guide.aspx

    http://sqlcat.com/sqlcat/b/top10lists/archive/2007/09/13/analysis-services-query-performance-top-10-best-practices.aspx

    http://social.msdn.microsoft.com/Forums/en-US/sqlanalysisservices/thread/f98b6aa5-1f1a-4823-b46b-be3b006bbcc1

    Hope this will help you somewhat and I will be glad if I can help you more.

    Thanks,

    Anil Maharjan

    • Marked as answer by Eileen Zhao Monday, February 25, 2013 3:06 AM
    Tuesday, February 12, 2013 12:41 PM