Star schema plain SQL query performance


  • I received a huge database with one table containing 450 million row. The database is SQL Server 2005 and we are implementing Table partition. We received the data from our customer as a STAR schema. After studying the requirement of the my customer, I found they only want a record(s) from the table and input these records into a econometric model.

    When I spoke to the so called DBA, he said the data retrieval will be a big problem and he said we should go for relational data schema (!).

    When I researched on the web, I got the following points

    Fact tables in star schema are mostly in third normal form (3NF), but dimensional tables are in de-normalized second normal form (2NF).

    As you know, the primary schema for a data warehouse is either a star schema or a "normalized" schema. The latter is a term so loosely defined that it's hard to describe, but a normalized schema typically resembles a third (or higher) normal form (3NF) schema that's not dimensional. These 3NF designs don't support query and analysis. Their sole purpose is to act as a staging area, an upstream data repository for a series of star schemas, online analytic processing (OLAP) cubes, and other structures that are directly queried by analysts. The only routine exception to this is Teradata implementations: Because of the unique characteristics of the massively parallel architecture and database optimizer, Teradata can process analytical SQL against a 3NF schema with acceptable performance.

    Because the 3NF approach is only a first step, the argument that schemas of this kind are perfect for data warehousing is misleading. The real-time landscape of an RTDW includes the whole process, or at least the process up to and including the physical structures that people query. Because the 3NF-centric approach just adds overhead, it is not, at this time, a good candidate for RTDW.

    I got the following question

    1. What % of performance degradation if I use SQL statment against Star Schema

    2. I thought the performance will be better, if you create cover index in SQL Server 2005.

    Sorry of this long e-mail. Any answers will be highly appreciated


    Tuesday, November 28, 2006 8:57 PM