Answered Database Table Design

  • Tuesday, June 16, 2009 3:22 PM
     
     
    Hi all,

    I need to know for the following scenario what is the best approach to design a table and field

    Say I'm having a field called "XStatus" in a "Status" table, i have defined some Master values for this field as:

    0 - Correct
    1 - Error1
    2 - Error2
    3 - Error3
    4 - Warning1

    However the value in this field is not atomic, for example the status can be 1 and 2, (i.e) for this field I'm can have bothe above mentioned errors.

    So If I'm having a master table for he available values how can i store a value in "XStatus" column which denotes the this field contains both 1 and 2.

    I want to store only a single value in "XStatus"


    rajesh

All Replies

  • Tuesday, June 16, 2009 4:11 PM
     
     

    Can you change your "master values"?

    The typical way to solve this problem is to use bit arithmetic.  If you redefine your list be expressed in powers of 2, like this:

    0 - Correct
    1 - Error1
    2 - Error2
    4 - Error3
    8 - Warning1

    Then you can express an "Error1" and "Error2" status as 3.  "Warning1" and "Error2" would be 10.  You can't do that kind of thing with the value list you currently have.

    In order to "detect" if one of your statuses is set on that field, you can use an expression like this:

    ([XStatus] & 4)

    The above will evaluate to true if XStatus contains an Error3 condition.

    In order to "set" an error condition on your status field, you would set the field using an expression like this (in a Derived Column component):

    [XStatus] | 4

    The above will "add" the Error3 condition to the column, or leave it unchanged if the Error3 condition is already set.


    Todd McDermid's Blog
  • Wednesday, June 17, 2009 10:59 AM
     
     
    Here's what I came up with.

    In your Status table make the column xStatus as UNIQUE. Create a new table (lets call it ErrorMapping) with two columns xStatus and errorNum. Add foreign key constraint on both these columns. xStatus references to xStatus from Status table ans errorNum references  master error table errorNum column.

    So when you add a record in Status table, insert some unique value in Status table and then insert any errors in ErrorMapping table against this value. You can have combined primary key on both columns of errormapping table which will mean that one type of error can happen only once with one type of record or transaction...

    Effectively you are storing only one status per record and then mapping this status to any number of errors in mapping table. If this is going to be your fact table and you have an error dimension, then you can build a view on top of this fact table and then build measures and all and anlyze by errors.


    Hope this will be useful to you.


  • Thursday, June 18, 2009 6:25 AM
     
     
    Hi guys,

    This is what we have done based on your replies, in fact we have combined both of the approaches suggexted by you.

    We are storing the list in Powers of 2,

    0 - Correct
    1 - Error1
    2 - Error2
    4 - Error3


    and tehn we are haing a mapping table, similar to following:

    0 - 0
    1 - 1
    2 - 2
    4 - 4
    3 - 1
    3 - 2
    5 - 1
    5 - 4
    6 - 2
    6 - 4
    7 - 1
    7 - 2
    7 - 4

    you could have guessed what we are trying to do, we are just haivn a master mapping table for all the combinations, so that in our Xstatus we shall sup up the integers and store, when i want to retrieve the values i need to Join from Status -> Mapping -> Error list, which will give me multiple rows.

    Whether this sounds to be a good design approach?



    rajesh
  • Thursday, June 18, 2009 9:48 AM
     
     
    I think this will work as long as number of entries in error table are not too many around 20-25.
  • Thursday, June 18, 2009 3:02 PM
     
     

    Your design of using a lookup table is really not necessary.

    Every combination of statuses is easily extractable using SSIS or T-SQL by using the bitwise operators.  Adding in a lookup table is adding another layer of complexity that is simply not required, and will lead to other problems.

    If you use bitwise flags (as I suggested), you can use up to 31 conditions in one INT (Int32) column, or up to 63 in a BIGINT (Int64) column.


    Todd McDermid's Blog
  • Tuesday, June 23, 2009 2:00 AM
     
     
    i think rajesh give a very good choice, here is another choice:
    0 - Correct
    1 - Error1
    2 - Error2
    3 - Error3
    4 - Warning1

    Table States(use nvarchat to store the error, use split char to separate the error)

    0
    1
    1,2
    0,1,2,3


  • Tuesday, June 23, 2009 3:48 AM
     
     Answered
    Unfortunately, that's not a great choice either.  With that particular design, you're storing a string in the fact table which is something to avoid - it takes too much space compared to a numeric value.  Second, you need to store multiple characters - but how many? One for each possible status (plus a "separator" character)?  On top of that - T-SQL doesn't have a "split" function.  The equivalent is CHARINDEX, and I believe MDX has a similar query function.  But you can't query that field (as specified) for "does '2' appear in there" if your list of states is longer than ten possibilities.  "1,12" would return "true" for that query, when it's obviously not true.  You could respecify the pattern to be ",1,12," (each item surrounded by a delimiter) so that an "index of" function would correctly locate a substring of ",2,".

    Bitwise arithmetic on a four-byte integer is far superior to any alternative listed to date.  (I hope I don't sound too arrogant saying that.)

    * It only takes 4 bytes to hold any combination of up 32 status attributes.  (Compared to 88 bytes for a string representation - 22x higher information density)
    * It's easily queryable by T-SQL and MDX into a boolean value indicating if the attribute is true or false.
    * No joins are required to resolve the attribute.

    Most important is the space consideration.  Take a billion row fact table - that's really not too large.  A fact table with a bit-masked status would need 4GB to store the status, as opposed to 88GB for a string-stored status - and that's on top of the fact data you've already got in the table.  The space isn't always relevant for storage (disk is "cheap"), but for queries.  You can have a lot more rows on one page if the row is smaller, and that means faster queries.
    Todd McDermid's Blog
    • Marked As Answer by Rajesh_vellore Wednesday, October 07, 2009 4:43 AM
    •  
  • Wednesday, July 01, 2009 12:29 AM
     
     
    rajesh,

    If you really have to have the column itself denote all the values it represents then you want to use bitmasking.




    Your other option is to take the primary key from the status table and put it in a lookup table with the value of the statuses.

    Derek SQLServerPlanet.com