none
Designing a data warehouse for SQL 2008.New project need new ideas ! RRS feed

  • Question

  • Hi Guys,

     

    It just occurred to me that the shop where I currently work will benefit from a database warehouse, at the moment it’s a Greenfield project with no requirement, but just the knowledge that a data warehouse is needed to store historic data and also perform data mining tasks that will be beneficial to the shop.

     

    As a Developer DBA, not too sure of where this will start at the moment from a technical perspective, there are considerations such as.

     

    Designing the database to be as efficient as possible and avoid previous mistakes.

    Reduce data duplication.

    Ensure that queries that will run on the database will be very well optimised.

    Have a lightweight design but yet powerful.

    Data transfer considerations, how will data be moved from production onto the warehouse.

    Disaster recovery considerations, with time the database will be a VLDB which also needs to be protected in the case of a disaster.

     

    I have looked at the MS best practice guide, but I am looking to get some idea from other technologist who are also faced with the same problem or have managed/designed a warehouse from start up.

     

    Thanks in advance.

     

    Monday, October 12, 2009 2:15 PM

All replies

  • RE: Reduce data duplication

    In most data warehouse databases that I have seen, the opposite is true. The goal is to "flatten" the data and reduce the number of relationships. This often means *more* data duplication and non-normalization.

    For example:
    Your database has an Employee table.
    Your database has an EmployeeType table with the employee types.
    Within the Employee table it has an EmployeeTypeId foreign key.

    In a warehouse database, you may instead store the EmployeeType string instead of the Id. That means that this string is repeated for every employee of the defined type.

    This makes it much easier to allow for end-user reporting or other analysis on the data.

    NOTE: I am not an expert in this area ... just sharing what I have seen.

    Hope this helps.

    www.insteptech.com ; msmvps.com/blogs/deborahk
    We are volunteers and ask only that if we are able to help you, that you mark our reply as your answer. THANKS!
    Monday, October 12, 2009 9:35 PM
  • hi
    I agree with Deborahk that Datawarehouse design considerations are to “flattern” the structure and have as less relationship hierarchy as possible.
    Primary objective is to improve performance/response time as it has to deal with large amount of data, compromising on redundancy is okay at times.
    Star schema and Snow Flat schema help you establish these kind of models as against ER for relational design. Fact and Dimension tables help you establish appropriate structure.

    Modeling KPI is important aspect and that is one of the input used in modeling Facts and Dimensions

    With SQL Server, you will have to consider UDM based OLAP models which are very powerful.
    For Data mining as well you can leverage SQL Server OoB Data mining algorithms, there are almost 9+ algorithms available.

    Data Integration, cleansing is most crucial aspect especially when you have multiple LOB Databases and also consideration for respective DB connectivity drivers.

    MetaData management is another key aspect of Design and you will have to leverage features available with SQL Server Integration Services.

    Planning Staging Databases, appropriate Refresh Intervals, Real time Datawarehousing to support some real time features are some of the other considerations.

    SQL Server provides Clustering and Mirroring for High Availability which you can leverage for HA Warehouses.

    Hope this helps.

    Regards

    Suds



    Thursday, October 15, 2009 4:28 AM
  • If you use OLAP you do not really need to denormalize underlying relational tables when you have that data available in OLAP database in MOLAP form, because all data in MOLAP is already joined and aggregated - that is one of MOLAP advantages. But if you query big fact tables at a very granular level you still may need to denormalize those and use HOLAP rather than MOLAP for those.

    Also, nowadays you can query SSAS OLAP databases the same way as a regular relational data - using LINQ. (You will need SSAS Entity Framework Provider for that)


    Sergey
    • Proposed as answer by grand_ua Sunday, October 9, 2011 1:36 PM
    Sunday, October 9, 2011 1:33 PM