none
Fact table design question RRS feed

  • Question

  •  

    I have a sales fact table with 7 dimensions.  Kimball recommends creating a composite primary key of all your dim surrogate keys for your fact table.  Up until this point I have been using the Sales Order Number from the OLTP as my primary key in my fact table because it has been unique.  For every sales order there is one ship date, one order date, one customer, one vendor, one sales rep, etc.  However, now I have to slice my data depending on the location where the product was manufactured and, in my case, inventory may be shipped from plant to plant in order to have specialized processes applied to the material.  So I may have multiple manufacturing sites for each sales order.

     

    My question: is it better to use every dimension surrogate key combined into a composite primary key for the fact table or should I just combine sales order number with location as the primary to get a the unique rows in my fact table?  Or am I not thinking about this correctly?

     

    Thanks.

     

     

    Tuesday, October 9, 2007 5:21 PM

Answers

  • This is from memory so let me work through it.

     

    I think that a Kimball design principal is that each dimension (dimension table) can be realted to muliple facts (in your fact table) and each fact is realted to one and only dimensional member.  In your circumstance, you now have a situation where your fact can be related to multiple dimensional members and vice versa (many to many relationship).  A Kimball design solution for this problem is to create a relational junction table or bridge table that goes between your fact table and the location dimension table.  You bridge table will contain a "group" key that groups together all locations invovlved in creating your product which is related to your fact table and the location key which is related to the dimension table.  There will also be a column for "weighting" that when summed across a "group" should equal 1. Kimball explains it better than me Smile so you might want to refer to one of his dimensional design books.  I dont think this has anything to do with a primary key on your fact table.

     

    I am am somewhat new to implentations of dimensional model using SSAS/SQL server so I am not sure how you would configure the tool to use this design.

    Tuesday, October 9, 2007 9:40 PM

All replies

  • Hello John! I think that that Kimball refers to a startschema in a relational source and with a SQL query engine in this case.

     

    Indexes will help when if you query less than 10 percent of the records in a table, if I do not have the wrong idea of how the TSQL query optimizer works in SQL Server 2005 and previous editions. Over 10 percent will mean a full table scan.

     

    When you load/process data into AS2000 or SSAS2005 your composite index/key can help but I do not think that it is that important unless you have very large amounts of data in the data warehouse.

     

    HTH

    Thomas Ivarsson

     

    Tuesday, October 9, 2007 5:53 PM
    Moderator
  • Perhaps I am stuck in Relational World.  I am thinking that in order to slice my sales by location, the dimension has to know which row(s) in the fact tables are keyed to a given location.  If I use Sales Order ID as the only key in the fact table then there will be only one row per sales order and there will be only one location per sales order.  It seems to me I want a row in my fact table for each location and I need some sort of key that will reflect multiple rows for each Sales Order.

     

    That and I will be having several million rows in the cube, a hundred thousand rows added daily.

    Tuesday, October 9, 2007 6:09 PM
  • Hello John. Are you using SSAS2005 and MOLAP? If so, the cube will have its own structures for collecting data. It has no connection to the relational(SQL) structure after it is processed.

     

    If you have a relational, SQL, data mart or run SSAS2005 in ROLAP mode, you have a point.

     

    HTH

    Thomas Ivarsson

     

    Tuesday, October 9, 2007 7:50 PM
    Moderator
  • I am running SSAS 2005, probably not in ROLAP, due to speed issues, though it remains a possibility.  I am trying to set up my OLAP star schema.

     

    Are you saying I don't need a primary key on my fact table?

     

    It seems to me I need the following rows in my fact table:

     

    S/O#     LocationCode    .....other columns.....

    1                1

    1                2

    1                8

    2                8

    2                57

    2                112

    3                1

     

    If I use S/O# as the Primary key, this is not allowed. 

     

    Are you saying I should NOT use a primary key and let SSAS identify the rows internally?  This is foreign to my relational training which is why I'm having a hard time conceptualizing.

     

     

    Thanks.

    Tuesday, October 9, 2007 8:15 PM
  • This is from memory so let me work through it.

     

    I think that a Kimball design principal is that each dimension (dimension table) can be realted to muliple facts (in your fact table) and each fact is realted to one and only dimensional member.  In your circumstance, you now have a situation where your fact can be related to multiple dimensional members and vice versa (many to many relationship).  A Kimball design solution for this problem is to create a relational junction table or bridge table that goes between your fact table and the location dimension table.  You bridge table will contain a "group" key that groups together all locations invovlved in creating your product which is related to your fact table and the location key which is related to the dimension table.  There will also be a column for "weighting" that when summed across a "group" should equal 1. Kimball explains it better than me Smile so you might want to refer to one of his dimensional design books.  I dont think this has anything to do with a primary key on your fact table.

     

    I am am somewhat new to implentations of dimensional model using SSAS/SQL server so I am not sure how you would configure the tool to use this design.

    Tuesday, October 9, 2007 9:40 PM
  • Thanks Tim.  I must have bumped my head.  I started out thinking it was a many to many but then I had an aneurysm or something.  Anyway I think you are correct in what I need; which should also take care of my key field problem......or at least my relational need to have a primary key.

    Wednesday, October 10, 2007 12:06 PM
  •  

    Hello John!  I have missed the business part of your problem. I have discussed the general design of starschemas.,

     

    Mostly you have one to many relation and foreign keys from dimensions to the fact table. A fact table primary key will not have any large effect on what happens in the cube after you have processed it.

     

    If you are concerned about duplicates you can create a unique index on all the foreign keys in the fact table.

     

    Both a primary key and a unique constraint/index can slow down the ETL loading of facts.

     

    HTH

    Thomas Ivarsson

     

    Wednesday, October 10, 2007 4:03 PM
    Moderator