locked
Data Warehouse Understanding RRS feed

  • Question

  • User1610713445 posted

    Hi guys,

    Could you please confirm my understanding of creating a database warehouse for the purpose of reporting.

    I have done a lot of reading, and it seems there are 2 main directions I need to chose from. Snowflake or Star, and it seems Star is more a normalised version of Snowflake. So, which ever I chose - they both have benefits from each other.

    Instead of the usualy 'Sales', Region', 'Salesman' type example, I'd like to have input on my real world example.

    So, lets say I have this in my current OLTP database. I have a system that stores information about housing aquisitions, by a large company. Houses are aquired by this department for either sale or demolition for different projects. A Project can have 1 or many Properties linked to it. A property has a sales price, an asking price and a valuation amount. A property has an area (in which the property falls), and a project has a Status. And, a project has a Project Manager.

    A project can have many 'Consultants' working on it. There are different types of consultants.  Zero to many consultants per project. A consultant has a cost value. A project has a Project Officer, and that officer can manage multiple projects. And a project has a status (Closed, rejected, Open...).

    So we have:

    Project table, has many Properties. Project table also has many consultants, which has a consultant type. 

    So, by what I am saying, 'Projects' is the central table, driving the system.

    The database is well normalised, and working well.

    I now have a request. At the moment, they are reporting off the OLTP database, but the data volume is increasing rapidly, and slowly, reports may slow down. What we want is a data warehouse, from which reports can be derived. Also, and this is a major point, things change in the aquisition world, and sometimes, changes are made to the database to reflect this. (Maybe not the database, but may a Spreadsheet that we import from).

    So, am I right in saying a Data Warehouse is the right 'fix'. (Or maybe a Data Mart??).

    My idea is this:

    1. See what is required in the reports. See what the user wants reported. Look at current reports and see what they want to keep. From this info, design a Data Schema for the warehouse.

    Based on what I have seen, property values is one thing reported on a lot.  Also, costs associated to a project is reported on, based on the property area. Something else that is reported on is a Project Officers performance (maybe). We want to see how many projects an officer has (sorted or filtered maybe by Status, and grouped by Area?) and the amount they're delivering (Sales price less the cost of all the associated conultant costs).

    2. Develop the database based on the requirement findings.

    We're going to develop the fact and dimension tables now, right? So, what I'd do maybe is this:

    Create the fact table, called 'ProjectFact'.

    That will have a INDITITY ID (Auto Increment), .. and now I am a bit stuck. Not sure what else should go in the fact table. I assume costs, but ... not sure which ones. Maybe the Sales Price, the Asking Price and a Valuation amount? Need help here. I think this table would need a Project Start Date and a Project End Date? Hope you can assist.

    I will create the Dimension tables as follows:

    ConsultantDimension (Contains all the data about a consultant, related to a project). So, an ID, auto incriment, a ProjectFactID, referencing the fact table, ConsultantName, ConsultantType (taken from the Consultant Type Table), Date work was complete, Amount.

    I'd have a ProjectOfficer Dimension, contacining all the details about a project office, RELEVENT to what may be reported on. Eg, OfficerName (I'd concatinate it as Uppercase Lastname, Surname. For example, 'LISTER, Craig', as that is the way their current reports report.

    I'd have an AreaDimension, which is basically an id and an Area name. No - actually, that is normalising. We don't want that, right? The Area Name will be in the 'ProjectDimension' table?

    So, we have a ProjectDimension, which has all the info about a project, gathered from the related tables. So, something like:

    ID (Auto Incriment)

    ProjectFactID (Related to the accosiated ProjectFact.ID)

    ProjectNumber (From OLTP.Project table)

    ProjectOfficerName (From OLTP.ProjectOfficer table) (Which means the ProjectOfficeDimension table above is removed)

    ProjectAreaName (From OLTP. Areas)

     

    So, we have 3 tables now. 

    ProjectFact

    ConsultantDimension

    ProjectDimension

     Wait, 4 tables. I'll need a TimeDimension. So, will my Fact table have an ID related to the timedimensionID?

    All tables (Except the TimeDimension) will have a ProjectFactID.

     

    So far, am I at ALL on the right track? Looks like I am heading towards a clean Star design. I'm not (yet) asking about how I'm getting the data to the DW. I'm really asking about design, and my initial approach. An advanced thing I need to cater for as well is the fact that Area Boundried do change, to make sales figures look better. I think that may have a major effect on my design, so maybe I need to know about that now. From what I have read, it looks like there would be some form of 'DateActiveStart, and DateActiveEnd' type thing on the Areas table. But not too sure.

    Hopefully you can assist and push me in the right direction. If all is good, I have a sample/test/learning database I can practise with, and I'll put this into something usable. But I'm really asking for comment on my current understanding. 

    Sorry, it's a long post, and if you got this far, thanks! Hope you can assist.

     

     

    Thursday, March 26, 2009 8:40 PM

All replies

  • User-359936451 posted

    I know this is an old post but I have seen similar questions still being posted by users unfamiliar with database concepts.

    What you have described is typically refered to as a relaional database. They typically have a good number of tables each with keyIds that may relate to other tables that there is some sort of relationship with.

    For ex, each house entered into your DB table will have  houseID, and you may have a projectMgr table with the name of project mangers. This table will also have a keyID, either or both tables may have a field that relates to the Id of the other, meaning the house table may have a field on each record that stores the projectManger key of the PM who is handling that house. If each PM only handled one house than the projectMgr table may also have a field that stores the keyId of the house table that he/she is managing. But more likely a third table would be created where each time a PM and houseID relationship is created both IDS are inserted as one record so that a single query can retieve the full list.

    Table1 HOUSES

    ID      Address       PM_ID

    1        123 my st       3

    2        456 your st     2

    3       789  out St       3

    Table2 PMs

    ID    PM_Name

    1      me

    2      them

    3      you

    Table3 PMs_and_Houses

    ID     House_ID    PM_ID

    1        2                    3

    2        3                    1

    3        1                    3


    Table 3 provides a direct list of houses and assigned pms. 

    There is a great deal on the web about this. Good luck.

    Tuesday, March 18, 2014 12:55 PM