control sparsity with base table and ancillary tables?
-
lunedì 30 luglio 2012 21:10
Greetings,
My extract is coming from essentially 1 source table, but I find it has a number of very sparse columns whose population is attributable to particular values in more 'core' columns.
I'm used to the functionality of MS Access where I could do a multi-table insert by binding the appropriate multi-table query to the UI (just as an example of single row INSERT).
- is it sensible to construe my DW as a main table with foreign keys to ancillary tables?
- if so, is this type of INSERT operation possible in SS 2008 R2 and how is it done (so far it yells at me: "'testview' is not updatable because the modification affects multiple base tables")?
- if not, should I employ the SPARSE keyword for the relevant columns?
Much obliged for any suggestions or advice.
John
Tutte le risposte
-
lunedì 6 agosto 2012 02:17
Sounds like you have a typical Fact table and a number of Dimension tables. Typically you'd insert your new Dimensions first (i.e. insert new records into the dimension or "ancillary" tables). Then insert the new records in the main or fact table.
Maybe you could give us an example of what you're trying to do with a couple of dummy records from the main and ancillary tables.
-
lunedì 6 agosto 2012 15:49
Hi Stephen,
Thanks for your response. First just to clarify my issue, I was referring to the RDBMS storage rather than the load into the Analysis Services dimensional space. So when I say DW, I mean the "staging" tables/views in the relational db. I don't know if there's consensus around the current best-practice with respect to retaining a "staging" layer between the source extract and the dimensional data in AS, but my plan was to use that layer to keep the most granular detail data there and drill-through to it from AS on an as-needed basis.
And specifically, I was suggesting that the fact table itself (in the RDBMS) be broken up into multiple tables bearing 1-1 / 1-Null relationships to the "core" fact table - essentially I was thinking to normalize it - so I wasn't referring to the dimensions or "lookup" tables. Rather, the "ancillary" tables must retain a FK to the PK on the main fact table. So here's an illustration of the original fact table, and my thought to normalize that into smaller chunks:
Hope this helps!
-
martedì 7 agosto 2012 00:14
Typically you'd load your source systems data into a dedicated data warehouse database. This way your DW isn't dependent on the schema of your source systems. The cubes in SSAS get created from the DW db rather than directly from your source systems.
Generally you'll want to de-normalise tables rather than normalise them in a DW - the less table joins the better. A popular data warehouse "model" is the one proposed by Kimball (http://www.kimballgroup.com/). There are lots of good books out there on the subject.
- Proposto come risposta Eileen ZhaoMicrosoft Contingent Staff, Moderator martedì 7 agosto 2012 09:24
-
martedì 7 agosto 2012 18:36
After briefly looking up Kimball's methodology and reading through his design tips, I get the sense that in many respects dimensional model schemas lean toward, or anticipate, a column-store bus architecture. By that token, it seems to me that the traditional row-oriented storage architecture is not an ideal candidate for implementing a fully dimensional schema. In other words, the dimensional model kind of introduces a new paradigm into the mix that an RDBMS wasn't particularly designed to handle.
But practically speaking, since an optimal storage footprint for the fact table is a critical design consideration in addition to efficient table scanning, I have to question the advisability of creating a bunch of overhead for storage of nulls, i.e., retaining sparse fact columns (not the measures but the attributes) in a single, wide, fact table.
Perhaps one could argue that fact tables in general are already "normalized" relations, and the empty space I propose to eliminate is simply a form of good housekeeping or quasi-normalization rather than true normalization?
So if I render the dimensional fact table at the logical level through a view, Kimball suggests that this will cost me server cycles, but I suspect it might actually be a performance gain if my tables are narrow and dense and the joins to the few supplemental fk-related fact tables I proposed above are optimized..
In any case, my concerns are moot until I can come back with some real performance comparisons. I appreciate the reference sources!

