I would appreciate any views and opinions on the design of implementing configuration data in a warehouse.
This example is one which is currently under discussion at my work place.
We have a warehouse which is the data source for a number of reporting channels.... cubes, reports etc.
The reporting channels all handle currency conversion and therefore have a Reporting Currency dimension (the currency to 'convert to') and a set of Exchange Rate facts. The business fact records are all tagged with an original currency so we can do the 'from - to' calculation based on which reporting currency is selected.
The Reporting Currencies are subject to change and can vary between different reporting channels.
The Exchange Rate table stores different sets of Exchange Rates identified by a Profile Code. ie. there may be a Year End Rate profile code which stores a full set of exchange rates... also a Current Month Rate which stores another full set of exchange rates.
Both tables are currently Type 1. The data is always just a current representation of currencies and the different sets of exchange rates.
The different reporting channels need to source different reporting currencies AND different exchange rates. Both of these are subject to change on an unspecified frequency.
ie. Cube X may implement 'GBP', 'USD', and 'EUR' as reporting currencies and use the 'YEAR END RATE' exchange rates. Cube Y may implement 'GBP', 'USD', 'EUR', and 'JPY' as reporting currencies and use a different set of exchange rates.
The options we are discussing to support the are:
i. In the Reporting Currency dimension table have flag columns for each reporting channel to identify the reporting currencies and similar columns in the Exchange Rate table.
Have config for each reporting channel held somewhere.. database table or xml file.. which is picked up by the daily ETL and it is the ETL processes which maintains the status of the flags in the tables.
The reporting channels data sourcing layer then queries against the relevant flags and returns the appropriate data.
i. Have a 'genric config' table in the warehouse, or supporting database, which stores generic key/pair type config.
ii. Implement a table valued UDF in the warehouse which can be queried by the reporting channels using an appropriate parameter to identify the reporting channel which then returns the appropriate rowset. ie. udfGetReportingCurrencies ('Cube X') would return the appropriate reporting currencies for Cube X. The UDF logic is responsible for reading the config table and identifying the right currencies.
Option ii is slightly more complex than I have explained here...
Firstly there is the generic key/value pair table and a generic UDF to return the config iyems for a given key.
Secondly there is the specific UDF's to source different data from the warehouse to the reporting channels. ie. udfGetReportingCurrencies (ChannelID) and udfGetExchageRates (ChannelID). These functions return data from the appropriate tables but also use the generic 'get config' elements to return the correct data for the reporting channel.
The arguments for and against so far...
Option 1. Maintains a true DWH model. ie. the dimension tables implement attributes which identify the config requirements. These flags are maintained via ETL. However, the design implements a column for each reporting channel. Obviously, this is subject to change. Adding further reporting channels would require further attribute columns. Also, we currently identify the reporting channels by the physical application... ie. Cube X, Cube Y... this could feasibly change to be by Business Area.. ie. Finance, HR.. etc. Changes would be reuired to the ETL and warehouse schema to accomodate these changes.
Option 2. Breaks the dimensional model as the 'config' elements could be viewed simply as attributes of the dimensions that should be maintained by the ETL processes. However, all changes are accomodated just by ammending the config data. No schema or ETL changes are required.
OK.. that's it. Not sure how well I've managed to describe the issue but hopefully there'll be someone out there with the right design experience who might be able to offer a few pointers.
I might take more time to investigate the designing options. However, about currency conversation in cube, the following blog is to describe how to do it http://blogs.msdn.com/b/sqlforum/archive/2011/01/04/walkthrough-implement-many-to-many-currency-conversion-in-an-analysis-services-solution.aspx for your first reference.
Thanks for the reply. The method of currency conversion documented in the blog link is pretty much exactly how we've implemented it, using a many-to-many relationship.
Reading back through my original post I'm beggining to think I could have made it more clear what my problem is.
Essentially it's wondering how best to implement application level config in the reporting channels.
We have a number of cubes that each have similar, but varying, requirments. For example, Cube A requires a different set of reporting currencies than Cube B.
We currently have these application specific requirements hardcoded into the named queries in the DSV. In much the same way as the AdventureWorks sample in the blog does for ReportingCurrencies.
We have a number of examples of things like this... date boundaries for example... and also dependencies on the same hardcoded values across different named queries... eg. if we restrict members of dimension X to be only members with a date value between certain boundaries, then these date boundaries also need to be reflected in the fact queries so we don't retrieve orphaned facts.
What I'd like to do is have have the filter criteria in the named queries to be driven from config rather than hardcoded so we could simply change some config data to control what data is loaded into the cube rather than having to change the DSV queries and redeploy the cube.
I was wondering if there was any precedent for this sort of design and if anybody had any experience implementing something like this.