locked
Conceptually, what would this data warehouse look like? RRS feed

  • Question

  • Let's say I have a relational database of all the cars in the US. For simplicity's sake, the database has the following information:

    * Year, make, model, color, price of every car available since 2000.
    * Every single car sold since 2000.
    * State where car was sold.
    * Date when car was sold.
    

    There's 1TB of data, so I end up implementing a BI solution.

    Generally-speaking, what would the dim and fact tables look like and what data would they include?

    Thanks.


    VM

    Thursday, March 21, 2013 8:33 PM

Answers

  • Hi,

    I think your question is a little bit to vague to give a single answer.

    Generally speaking, then the Fact table(s) contains all the data you want to use for measures (sum, count etc.) where the the Dimensions contains the data your want to "group" your measures.

    In your case, I think you would have dimensions for e.g. Date, State, Model and Color, but it all comes down to what you want to look at. You'll then have a fact table that contains every car sold with the price of the car. This fact table will then have link (Keys) to the dimension tables, so you e.g. can look at how many cars where sold of a certain model in a certain state in a certain year.

    I'll suggest that you google the subject, because that will give you a lot of good information. You can also try to look at this one - http://www.jamesserra.com/archive/2012/03/data-warehouse-architecture-kimball-and-inmon-methodologies/ which will give you a brief overwiev of the two major Datawarehouse methodologies defined by Ralph Kimball and Bill Inmon.


    Steen Schlüter Persson (DK)

    • Proposed as answer by Kalman Toth Friday, March 22, 2013 1:38 PM
    • Marked as answer by rbhatup Tuesday, March 26, 2013 12:41 PM
    Friday, March 22, 2013 7:33 AM

All replies

  • Hi,

    I think your question is a little bit to vague to give a single answer.

    Generally speaking, then the Fact table(s) contains all the data you want to use for measures (sum, count etc.) where the the Dimensions contains the data your want to "group" your measures.

    In your case, I think you would have dimensions for e.g. Date, State, Model and Color, but it all comes down to what you want to look at. You'll then have a fact table that contains every car sold with the price of the car. This fact table will then have link (Keys) to the dimension tables, so you e.g. can look at how many cars where sold of a certain model in a certain state in a certain year.

    I'll suggest that you google the subject, because that will give you a lot of good information. You can also try to look at this one - http://www.jamesserra.com/archive/2012/03/data-warehouse-architecture-kimball-and-inmon-methodologies/ which will give you a brief overwiev of the two major Datawarehouse methodologies defined by Ralph Kimball and Bill Inmon.


    Steen Schlüter Persson (DK)

    • Proposed as answer by Kalman Toth Friday, March 22, 2013 1:38 PM
    • Marked as answer by rbhatup Tuesday, March 26, 2013 12:41 PM
    Friday, March 22, 2013 7:33 AM
  • Thanks for the information.

    I know it's a rather vague question, but I posted it to get an idea of how the datawarehouse would look like.

    In this case, wouldn't the fact table have some kind of grouping instead of having all the raw data? The resulting datawarehouse would be similar in structure to the original database, since the fact table would be the same as the table in the original database.


    VM

    • Marked as answer by rbhatup Tuesday, March 26, 2013 12:41 PM
    • Unmarked as answer by rbhatup Tuesday, March 26, 2013 12:41 PM
    Friday, March 22, 2013 2:55 PM
  • Hi,

    You could group your fact data, but if you do this, you already start to do some assumptions regaring your data and the Dimensions you want to use.

    Let's say that  you e.g. have a sales tables that only contains the number of units sold and a Product Number. If you only want to look at how many units have been sold pr. day, you could group on date and Product no, which will then give you one record pr. date and Product. That will be fine for now, but you will never be able to go down on a lower level to examine the data.

    Now one of you sales managers also want to look at the sale pr. customer and not only date and product. This means that your grouping doesn't work anymore because you now also want to group on customers. This means that you'll most likely have to re-load your full fact table with the new groupings. Now next time a user ask for a new "grouping" you'll have to do the same exercise.

    If you instead load your fact table with the level as the source data, you don't have to re-group is everytime a new Dimension (grouping) is needed. Then you just have to add the Key column for the new dimension and then update this key with the values and you are done.

    You can also take a look at this article from Kimball - http://www.kimballgroup.com/2003/01/01/fact-tables-and-dimension-tables/

    That describes the concepts of Fact and Dimension tables from the Kimball point-of-view. There are tons of other articles out there, so I can again only suggest that you google it and start reading...:-). 


    Steen Schlüter Persson (DK)

    Tuesday, March 26, 2013 8:26 AM