locked
what is the risk of using sparse columns instead of normal relational tables? RRS feed

  • Question

  • I have three table that related to one table,in my new design I want to replace the relationship by using sparse columns. you Can see diagram below.

    I know I would have Redundancy.but i want to know what is the other risk of this way of design? by this new way my queries would be very easy and fast.what is your idea about it?sorry for my bad english,and thanks.


    M.Bagheri

    Tuesday, October 23, 2012 12:13 PM

Answers

All replies

  • Let me make sure that I understand the upper relationships:

    • You have WfDoc as the parent table
    • Attendeance has a 0 or 1 to 1 relationship to WfDoc
    • Credit has a 0 or 1 to 1 relationship to WfDoc
    • ReplaceDetail has a 0 or 1 to 1 relationship to WfDoc
    • All non-key columns are nullable

    Is this correct? 

    Under these circumstances how could you possibly have redundancy?

    What is the purpose of the FID column in the lower table?

    How do you resolve the two "name" columns in the upper design to the lower design?

    To me the upper design looks over normalized and the lower design looks more appropriate -- but a "name" column issue might need to be resolved.  Also, the lack of any other unique column concerns me somewhat as does the lack of any non-null columns other than key columns; to me this looks like putting together "tables" without much analysis.


    Tuesday, October 23, 2012 12:35 PM
  • I find it difficult to believe that all "real" columns in either example can be null.  In my opinion, I think you should take another try at analyzing your data.  In any event, your first attempt appears to be inherently incorrect.  For a given ID in wfDoc, the absence of a startdate and an enddate is represented by the complete absence of an associated row in Attendance - but you allow nulls for both.  This same idea applies to all of the child tables.  In addition, I am skeptical that a given row would have an EndDate without a StartDate.  Things to consider, but only you know the data.

    I will say that the first approach using multiple child tables in a 1:1 relationship can be considered more consistent with relational theory.  Typically, this is done to avoid the use of NULL values - something that your model is obviously not doing.  That is why I suggest you revisit the first model.  Your second model - using a single table with nullable columns - is a more typical design.  Since you raise the idea of sparse columns, have you taken the time to estimate the reduction in space to determine if there is a benefit?  Presumably you have also taken the time to thoroughly read about sparse columns in BOL; there is a brief mention that they "reduce the space requirements for null values at the cost of more overhead to retrieve nonnull values".  You may want to search for information on this "cost" before you decide (or do both and perform your own evaluation).

    In your second model, FID is not nullable while it is in the first.  Therefore, your models are not completely consistent with each other.  Given the name, perhaps FID should also be unique? As Kent has already mentioned, nullability is not redundancy.  Your second model does not appear to introduce any redundancy.

    In your last paragaph, you state "my queries would be very easy and fast".  First, your data model should not accomodate laziness.  Too many programmers make too many shortcuts due to extreme laziness.  With the tools currently available, there is no reason to be concerned with "easy" since there are many producticity-enhancing features already available.  Neither should the joining tables in a query be considered the opposite of "easy".  Second, performance of your database is something you should consider after you thoroughly understand your system and have modelled it correctly.  Performance is also measured as a whole, across a range of activities with a range of volume - optmizing one particular query can reduce perfomance in another area.  At this point, I don't think performance should not be one of your priorities. 

    Lastly, one should never use reserved words as identifiers - Desc should presumably be Description, Date should be something (which is not represented in your sparse example). 

    Tuesday, October 23, 2012 5:59 PM
  • "@kent Waldrop" thanks for replay.

    but Actually

    •  WfDoc as the parent table
    • Attendance has a 1 to n relationship to WfDoc
    • Credit has a 1 to n relationship to WfDoc
    • ReplaceDetail has a 1 to n relationship to WfDoc
    • All non-key columns are nullable

    In my knew design.If the "WfDoc.Type" field was "ReplaceDetail" .( => means that convert )

    • "ReplaceDetail.ID" => "WfDoc.FID"
    • "ReplaceDetail.Name" => "WfDoc.FName"
    • "ReplaceDetail.Desc"  => "WfDoc.FDesc"

    ,or if the Type was "Credit"

    • "Credit.ID" => "WfDoc.FID"
    • "Credit.Name" => "WfDoc.FName"
    • "Credit.Date"  => "WfDoc.FStartDate"

    ,or if the Type was "Attendance"

    • "Attendance.ID" => "WfDoc.FID"
    • "Attendance.StartDate" => "WfDoc.FStartDate"
    • "Attendance.EndDate"  => "WfDoc.FEndDate"

    what do you think about it?is there any mistake?


    M.Bagheri

    Wednesday, October 24, 2012 6:01 AM
  • I was wandering if you were aiming at perhaps some variation of an EAV table.  You might want to read up on EAV tables before you go using them.  Here is some information related to the use of "EAV tables":

    Table Design, EAV DATA:

       http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
          Wikipedia Article

       http://weblogs.sqlteam.com/davidm/articles/12117.aspx
       http://codemagento.com/2011/03/joining-an-eav-table-to-flat-table/
       http://www.learn.geekinterview.com/it/data-modeling/entity-attribute-value-eav-/1.html

       http://sqlblog.com/blogs/aaron_bertrand/archive/2009/11/19/what-is-so-bad-about-eav-anyway.aspx
          Aaron Bertrand

       http://pratchev.blogspot.com/2010/07/refactoring-entity-attribute-value.html
          Plamen Ratchev

       http://social.msdn.microsoft.com/Forums/en-US/databasedesign/thread/087db982-2d9b-4d2a-903c-9e8fe5548f21
          Karen Lopez
          Aalam Rangi


    Table design Problems -- EAV and OTLT design problems

       http://social.msdn.microsoft.com/forums/en-US/transactsql/thread/dc6f8f29-de93-427f-981b-858245f01a69/
          Arnie Rowland
          Alejandro Mesa
          Kent Waldrop

       http://www.simple-talk.com/sql/t-sql-programming/avoiding-the-eav-of-destruction/
          Article by Joe Celko

       http://weblogs.sqlteam.com/davidm/articles/12117.aspx
          David M

       http://social.technet.microsoft.com/Forums/en-US/transactsql/thread/29362835-0b49-4dd2-b5ae-33a60ebdc08e
          Kent Waldrop

       http://social.technet.microsoft.com/Forums/en-US/sqlgetstarted/thread/73b994ff-755b-40ae-9320-c7d21620e526
          Kent Waldrop
          George Neville
          David Stewart

       http://social.technet.microsoft.com/Forums/en-US/transactsql/thread/4b544f43-2d46-4ddd-8cb0-4647bf5ad303
          Kent Waldrop
          Naomi Nosonovsky

       http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/0dec83f4-f2d2-459b-8be6-7a1be73a0bfe/#aadd543a-bd0f-4a31-971f-754b95d99bd9
          Joe Celko
          Kent Waldrop
          Deepak Munigela
          Naomi Nosonovsky

    Now for the more important question:

    Why are you considering moving away from a relational design to an EAV design?  Also, are you a rookie or intermediate level designer?

    My knee-jerk reaction to what I have seen so far is stick with what you have as the first design and do not use the second design.


    In particular, give a look at Joe Celko's article, "Avoiding the EAV of Descruction".


    Wednesday, October 24, 2012 3:51 PM
  • thanks Kent Waldrop.I will report this to our DBA department.this idea just was a suggestion in a meeting and I couldn't prove ,it is wrong as well as you do. you help me very much and I'm thankful.

    B.Ghadami



    Sunday, October 28, 2012 8:35 AM