none
Technical Wording help (not code)

    Domanda

  • I need help putting this into technical terminology:

    Our database "admin team" designed database to house survey questions. The problem is they gave every single question its own column and every single answer its own column. Essential an entire survey for one person is stored in a single data row.

    I've seen survey databases before and each row represents a question and its response. I remember learning its better to have many rows rather than too many columns. I also know that the way they set this up is bad database design practice, but I don't know the technical terminology to describe this.

    Can anyone help me out? Thanks

    -Nothing to see. Move along.

    giovedì 1 marzo 2012 03:42

Risposte

  • Hi,

    In database design we always consider Normalization if it possible your database must contains no Null values No duplicates if your database achieve these conditions then it is better design than others that contains nulls and duplicates. (the above is very very small amount of description) check this link.

    http://en.wikipedia.org/wiki/Database_normalization

    http://databases.about.com/od/specificproducts/a/normalization.htm

    I hope this is helpful.

    Elmozamil Elamir

    MyBlog


    Please Mark it as Answered if it answered your question
    OR mark it as Helpful if it help you to solve your problem
    Elmozamil Elamir Hamid
    http://elmozamil.blogspot.com

    • Contrassegnato come risposta Stephanie Lv giovedì 8 marzo 2012 07:36
    giovedì 1 marzo 2012 06:07
  • This is a classic "Entity-Attribute-Value" SQL anti-pattern. This works well in some NoSQL databases, but not relational database. If this is the type of data and design pattern they want to use, they should not be using SQL Server but a database like hadoop, BigTable, etc.

    This does not mean that SQL server is not the right database for your project, just not the right choice for this design pattern. I would prefer to use MS SQL  over any of these NoSQL any day. I would suggest your database "admin team" review the first rule of Normalization as suggested by Elmozamil.

    The key relational faults with this design are:

    1) Difficult to rely on Question values.

    * If you need to correct a typo, or rephrase the question, you either have to change all the historical data, or change the question going forward. This will mean that reporting on results will need to know all versions of the question.

    * If it is a multiple choice question, it will be very Difficult to enforce referential integrity for the answers.

    *  It will be very Difficult to enforce any constraints:

    - not null for mandatory answers.

    - Data types like date time, integer, money vales, etc. 

    2) Indexing will be very inefficient.

    3) Reporting SQL will be a nightmare to write or tune. (Think about what a self join would look like)

    If you Google the  "Entity-Attribute-Value" anti-pattern, you will find all kinds of issues and suggestions on how to change this.



    William F. Kinsley

    • Contrassegnato come risposta Stephanie Lv giovedì 8 marzo 2012 07:36
    giovedì 1 marzo 2012 19:17

Tutte le risposte

  • Hi,

    In database design we always consider Normalization if it possible your database must contains no Null values No duplicates if your database achieve these conditions then it is better design than others that contains nulls and duplicates. (the above is very very small amount of description) check this link.

    http://en.wikipedia.org/wiki/Database_normalization

    http://databases.about.com/od/specificproducts/a/normalization.htm

    I hope this is helpful.

    Elmozamil Elamir

    MyBlog


    Please Mark it as Answered if it answered your question
    OR mark it as Helpful if it help you to solve your problem
    Elmozamil Elamir Hamid
    http://elmozamil.blogspot.com

    • Contrassegnato come risposta Stephanie Lv giovedì 8 marzo 2012 07:36
    giovedì 1 marzo 2012 06:07
  • It depends. "Columns"design advanteges are:

    - every answer column may have specific datatype (datetime, number, lookup table reference, etc),

    - fast queries (no need for pivoting, no data conversion, "natural" indexing).

    Disadvantages:

    - need DDL to add/remove a question. Which is usually restricted to admins only.  And DDL blocks the hole table, stopping regular processing,

    - more data space required.

    Dynamic SQL will be inevitable in both cases if the list of questions  is supposed to change. So it can't be listed as advantage/disadvantage.

    As a rule, the questionary DB is not very large, so the speed and memory hardly are the issues. That's why "row" design prevails.


    Serg

    giovedì 1 marzo 2012 07:14
  • The design itself works for the problem at hand, so the model works. However, it is a lack of foresight for future expansion.

    The word i would use is "limited".

    giovedì 1 marzo 2012 13:56
  • This is a classic "Entity-Attribute-Value" SQL anti-pattern. This works well in some NoSQL databases, but not relational database. If this is the type of data and design pattern they want to use, they should not be using SQL Server but a database like hadoop, BigTable, etc.

    This does not mean that SQL server is not the right database for your project, just not the right choice for this design pattern. I would prefer to use MS SQL  over any of these NoSQL any day. I would suggest your database "admin team" review the first rule of Normalization as suggested by Elmozamil.

    The key relational faults with this design are:

    1) Difficult to rely on Question values.

    * If you need to correct a typo, or rephrase the question, you either have to change all the historical data, or change the question going forward. This will mean that reporting on results will need to know all versions of the question.

    * If it is a multiple choice question, it will be very Difficult to enforce referential integrity for the answers.

    *  It will be very Difficult to enforce any constraints:

    - not null for mandatory answers.

    - Data types like date time, integer, money vales, etc. 

    2) Indexing will be very inefficient.

    3) Reporting SQL will be a nightmare to write or tune. (Think about what a self join would look like)

    If you Google the  "Entity-Attribute-Value" anti-pattern, you will find all kinds of issues and suggestions on how to change this.



    William F. Kinsley

    • Contrassegnato come risposta Stephanie Lv giovedì 8 marzo 2012 07:36
    giovedì 1 marzo 2012 19:17
  • There is an additional impact if you anticipate having large volumes of data. In a nutshell, SQL Server performance is sensitive to the data width of a row, as SQL Server controls caching at the 8K page level. Designs like this mean that less data will fit on a page - perhaps no more than 1 row - and therefore values must be swapped in and out of cache more frequently. Performance degrades as volumes increase (depending on your usage).

    You may be able to get away with this at low volumes, but over time the problem becomes increasingly apparent.

    Put all the questions in one table, one row per question. Now the text is only stored once.

    The answers go in a separate table. Each answer is a row consisting of a unique identifier, a Foriegn Key to the Question's identifier, and the answer itself - and some identifier of who the answerer is, if needed.


    duncan davenport . data engineer and architect

    lunedì 5 marzo 2012 21:44
  • It's not EAV - it's a denormalized record. 

    Entity-Attribute-Value would look like this:

    ID   Data               Value

    1    Question         What color are apples?

    1    AnswerOption Red

    1    AnswerOption Blue

    1    AnswerOption Green

    1    AnswerOption Pink

    1    UserAnswer    Red

    (Heck, I can't even wrap my head around doing this right...)

    Basically EAV is completely metadata-driven, which provides huge flexibility. It also makes it nightmarish to make changes, report on, or write queries. It's the epitome of "write once, read never." 

    What BlackSailbot is talking about is denormalization, the opposite of EAV, really:

    Question1   Question1Answer         Question2       Question2Answer      Question3        Question3Answer

    Honestly, for a low-usage database this isn't that bad. It's easy for less technical users to work with, it's easy to create forms against, it's easy to report on, etc. If you stick a version number column on the exam table and on the response table, you can even change questions and still have valid old data. 

    However, what the OP was asking about was the midpoint, which is somewhat metadata driven:

    QuestionNbr       Question                               AnswerType         Answer

    1                         Who discovered America?    FillIn                      Columbus

    2                         Who walked on the Moon?  FillIn                      Armstrong 

    etc.

    The first benefit of this that comes to mind is that you can shuffle questions on a test and have multiple versions to avoid cheating. You can also easily move to a "test bank" system where you have hundreds of questions and can generate a quiz at any time. In other words, it's extensible. 

    tl;dr:

    Single row of data: easy to work with, hard to change. 

    Normalized system: a bit more complex, still somewhat easy to work with, fairly easy to change within reason

    EAV: Extremely complex, hard to maintain, flexible enough to allow for just about any change

    Philo


    Philo Janus, MCP Bridging business & Technology: @ http://www.saintchad.org/ Author: Pro InfoPath 2007 & Pro InfoPath 2010 Pro PerformancePoint 2007 Pro SQL Server Analysis Services 2008 Building Integrated Business Intelligence Solutions with SQL Server 2008 R2 & Office 2010

    martedì 6 marzo 2012 12:03
  • My experience has shown me a lot of large, high usage databases have evolved from what were once small “low-usage databases” and the original design patterns continue to be carried forward. If you are using a RDBMS, following a good relational database design based on the rules of normalization is a best practice for any size of database.

    I think a lot designers underestimate how long an database’s life cycle will live on and how complex their database can evolve over time. 



    William F. Kinsley

    martedì 6 marzo 2012 15:55
  • I agree with the issue that applications *always* live beyond their envisioned lifespan. However, many applications also get retired, die on the vine, etc. Good design patterns are great, but having an application that serves the user's needs is better. Ideally there is a balance between strong industrial enterprise design and "just getting the thing out the door."

    But once an application is in service, then I quickly drop to "don't fix it if it isn't broken." If the application is going to grow, redesign it then, instead of stopping everything to redesign it (without requirements) now.


    Philo Janus, MCP Bridging business & Technology: @ http://www.saintchad.org/ Author: Pro InfoPath 2007 & Pro InfoPath 2010 Pro PerformancePoint 2007 Pro SQL Server Analysis Services 2008 Building Integrated Business Intelligence Solutions with SQL Server 2008 R2 & Office 2010

    martedì 6 marzo 2012 16:15
  • A little additional investment up front for a a good relational design will allow for expansion and growth. Rarely is there ever time available to "redesign" a deployed system. Redesigning after the fact cost more in the long term, then the initial investment up front ever will.

    Refereeing back to the original question by blacksaibot, he believes its a bad design and he is looking for help in expressing it (and I agree with him).

     

    William F. Kinsley

    martedì 6 marzo 2012 17:48