Thursday, February 19, 2009 2:50 AMHi All,
Im using SQL Server 2008 enterprise and am wondering the best way to structure my data warehouse
I have 3 source databases , 2 are SQL server databases and the other is a sybase database, Each database pertains to a different business stream.
In building my database I would like to keep the business streams separate and maybe separate the business stream further by subject matters ie sales/ marketting but allow them to share common dimensions amongst all business streams
What would be the best way to build the datawarehouse using SQL 2008 ?
i.e. Would it be good practise to build a database for each different business stream, This I built 3 different databases each with its own staging and dimenision tables and fact tables , how would I share common dimensions.
Or is it better to have 1 large database using naming conventions for staging, fact and use views?
Im just wondering how do you separate business streams and best technical practises whilst keeping common dimensions accessible to all ?
Thursday, February 19, 2009 5:44 AMI am little confuse between your words, you said you want to create three different databases( i guess you mean tables/dimension),Correct me if i am wrong.I would say create a single database and since you got three different sources and they are all different business entities so what i would say make your tables from three different sources or prefer to make them into dimension and fact if there is any common factor is there (granularity like day,month,hours etc)
So Now you know your database tables and now you have to make decision based on that.
Biggest question is why you want to do this?
Dibyant S U padhyay
Thursday, March 12, 2009 10:31 AM
Sorry for the long thread.. I find this discussion to be very interesting from architecture point of view as well. This is more of a discussion here -
As per my experience, creating centralized data warehouse works better from long-term perspective than creating separate data warehouses for each Business Process/stream. One of the major reason being correlating data across the business stream (data from different business process of an organization usually have some or the other reasons/context where they need to be used/analyzed together by some or the other users) and using common dimensionality (conformed dimensions like Geography, Time Dimension etc). Also, in a process of building a data warehouse, one would want to create single version of the truth for users from different process but from the same org.
Having single data warehouse doesn’t mean that you don’t have separate processing stream for each source system. You could create separate data flow through SSIS for each of the data sources. This is not very uncommon because, for example, you may want to load data from different sources on different schedules or frequencies.
Here’s an approach I can suggest –
Create separate extract databases for each source (else you would need to use specific naming for tables and use only relevant tables in ETL for each data sources – keeping separate extract would help making it simpler and easier).
Creating one staging database for processing all – there are usually more than one reason to do this. Examples are you want to (in future if not now) combine data from two tables from different sources into one dimension table –specific example would be: you get marketing team member details from marketing dept and some personal info of those members from HR dept database (different data source). Combining these info to make an Employee dimension (with Role as Marketing Executive etc) as a unified view used by different transactional data in the data warehouse will make it richer and more robust system for analysis (very basic and easy example - suppose you want to compare marketing executives efforts by number of years experience in your org based on DOB info coming from HR database).
Create a single data warehouse to store data from all the three data sources. While designing your data warehouse (dimensional model) consider making it an enterprise data warehouse rather than data source specific – I mean make the data model for users rather than based on where the data comes from. You may want to refer Data Warehouse Toolkit for examples and concepts on transforming data captured in business processes (business streams that you have mentioned) into a generic data warehouse (rather than building database for specific reporting/analytical application needs – build a data warehouse on which you could build many such applications now or later).
In the data warehouse, when you consider data granularity and dimension relationship, mostly the fact records (transactions/events) fall into different fact tables and so most of the data from different business streams are still separate but in the same database.
ETL: Extract procedure for getting relevant data from source systems into your extract and even the databases (staging tables relevant to corresponding sources) can be kept totally separate. ETL that processes data from staging and loads it into data warehouse can also be source/business stream specific (except for those tables in the data warehouse that need data processed from more than one source).
Saturday, April 04, 2009 4:44 PMBased on the scenario mentioned, to me it looks like a datawarehouse scenario.
First thing that need to be clarified is that for using the data available in all three databases, what are the integration business rules? How is that data can be merged together and put in one place? Frame the set of business rules and keep them documented with at most clarity.
Based on the above details, Have ETL logic in place . Ensure that you will be in a position to identify each extract of data that is processed and loaded into data warehouse. Build a single datawarehouse which caters your business needs. Reason to have extract logic in place is that there could be various possibilities when the data is processed from different data sources and situations might arise demanding for reprocessing the data specific to an extract. In such cases, you might need to clean up the data which is loaded based on the extract identifier and reload the data accordingly.
Hope I am clear and not confusing. Let me know if you need any further details on the same.
Thursday, April 23, 2009 1:21 AM
I agree with hegde basically.
Another advantage to reduce number of relational data warehous database is when slowly changed dimension is included. Use the employee example, people move in and out organization units. Some people ware more than one hat sometimes. Employee data from HR may be the base of different teames say sales, marketing, and engineering. Each team may has their own goal and other fact review rules. Employ, position, goals, and rules changes iteratively. Keep all the related dimension states across time in less places will make rollup process accurate and easier.
Hope this will help.