locked
Two table designs but any practical difference....? RRS feed

  • Question

  • We have a table where the first column is the primary key and the "name" column has a lot of  duplicate data:

    I wanted to split the tables into two tables:

    Other than introducing yet another table is there any practical difference between the two approaches? Advantages? Disadvantages?

    TIA,

    edm2

    Tuesday, May 14, 2013 3:22 AM

Answers

  • As Kalman mentioned, they provide different cardinality of the relationships, but effectively can represent the same data and relationships.  Sticking with the mantra of "Normalize until it hurts, denormalize until it works", I'd go with the two table approach as it is normalized "enough", but also provides a fairly easy structure to write your queries / stored procedures, etc. for CRUD operations. 

    Thanks,
    Sam Lester (MSFT)


    http://blogs.msdn.com/b/samlester

    This posting is provided "AS IS" with no warranties, and confers no rights. Please remember to click "Mark as Answer" and "Vote as Helpful" on posts that help you. This can be beneficial to other community members reading the thread.

    • Proposed as answer by Kalman Toth Friday, May 24, 2013 7:25 PM
    • Marked as answer by Allen Li - MSFT Monday, May 27, 2013 10:18 AM
    Tuesday, May 14, 2013 7:29 AM
  • Hi edm2,

    I agree with Samuel but I would like to explain more in this regards:

    Normalization is the process of efficiently organizing data in a database. There are 2 goals of the normalization process: eliminating redundant data and ensuring data dependencies make sense. The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms & are numbered from 1 to 5 (the lowest form of normalization referred as 1NF).

    First normal form (1NF) sets the very basic rules for an organized database

    -Eliminate duplicative columns from the same table.
    -Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

    Second normal form (2NF) states that it should meet all the rules for 1NF and there must be no partial dependences of any of the columns on the primary key, example:

    CREATE TABLE CUSTOMERS(
           CUSTOMER_ID    INT              NOT NULL,
           CUSTOMER_NAME VARCHAR (20)      NOT NULL,
           ORDER_ID   INT              NOT NULL,
           ORDER_DETAIL VARCHAR (20)  NOT NULL,
           ORDERSALE_DATE  DATETIME,
           PRIMARY KEY (CUSTOMER_ID, ORDER_ID)
    );

    To make this table comply with second normal form 2NF, you need to separate the columns into three tables.

    CREATE TABLE CUSTOMERS(
           CUSTOMER_ID    INT              NOT NULL,
           CUST_NAME VARCHAR (20)      NOT NULL,
           PRIMARY KEY (CUSTOMER_ID)
    );

    CREATE TABLE ORDERS(
           ORDER_ID   INT              NOT NULL,
           ORDER_DETAIL VARCHAR (20)  NOT NULL,
           PRIMARY KEY (ORDER_ID)
    );

    CREATE TABLE CUSTMERORDERS(
           CUSTOMER_ID    INT              NOT NULL,
           ORDER_ID   INT              NOT NULL,
           ORDERSALE_DATE  DATETIME,
           PRIMARY KEY (CUSTOMER_ID, ORDER_ID)
    );


    Regards,

    Tarek Ghazali

     

     

    • Proposed as answer by Kalman Toth Friday, May 24, 2013 7:25 PM
    • Marked as answer by Allen Li - MSFT Monday, May 27, 2013 10:18 AM
    Tuesday, May 14, 2013 11:58 AM

All replies

  • The first one is for one-to-many relationship.

    The second one is for many-to-many relationship.

    Decide the cardinality of the relationship to choose the appropriate table design.


    Kalman Toth Database & OLAP Architect sqlusa.com
    New Book / Kindle: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012


    Tuesday, May 14, 2013 7:12 AM
  • As Kalman mentioned, they provide different cardinality of the relationships, but effectively can represent the same data and relationships.  Sticking with the mantra of "Normalize until it hurts, denormalize until it works", I'd go with the two table approach as it is normalized "enough", but also provides a fairly easy structure to write your queries / stored procedures, etc. for CRUD operations. 

    Thanks,
    Sam Lester (MSFT)


    http://blogs.msdn.com/b/samlester

    This posting is provided "AS IS" with no warranties, and confers no rights. Please remember to click "Mark as Answer" and "Vote as Helpful" on posts that help you. This can be beneficial to other community members reading the thread.

    • Proposed as answer by Kalman Toth Friday, May 24, 2013 7:25 PM
    • Marked as answer by Allen Li - MSFT Monday, May 27, 2013 10:18 AM
    Tuesday, May 14, 2013 7:29 AM
  • Hi edm2,

    I agree with Samuel but I would like to explain more in this regards:

    Normalization is the process of efficiently organizing data in a database. There are 2 goals of the normalization process: eliminating redundant data and ensuring data dependencies make sense. The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms & are numbered from 1 to 5 (the lowest form of normalization referred as 1NF).

    First normal form (1NF) sets the very basic rules for an organized database

    -Eliminate duplicative columns from the same table.
    -Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

    Second normal form (2NF) states that it should meet all the rules for 1NF and there must be no partial dependences of any of the columns on the primary key, example:

    CREATE TABLE CUSTOMERS(
           CUSTOMER_ID    INT              NOT NULL,
           CUSTOMER_NAME VARCHAR (20)      NOT NULL,
           ORDER_ID   INT              NOT NULL,
           ORDER_DETAIL VARCHAR (20)  NOT NULL,
           ORDERSALE_DATE  DATETIME,
           PRIMARY KEY (CUSTOMER_ID, ORDER_ID)
    );

    To make this table comply with second normal form 2NF, you need to separate the columns into three tables.

    CREATE TABLE CUSTOMERS(
           CUSTOMER_ID    INT              NOT NULL,
           CUST_NAME VARCHAR (20)      NOT NULL,
           PRIMARY KEY (CUSTOMER_ID)
    );

    CREATE TABLE ORDERS(
           ORDER_ID   INT              NOT NULL,
           ORDER_DETAIL VARCHAR (20)  NOT NULL,
           PRIMARY KEY (ORDER_ID)
    );

    CREATE TABLE CUSTMERORDERS(
           CUSTOMER_ID    INT              NOT NULL,
           ORDER_ID   INT              NOT NULL,
           ORDERSALE_DATE  DATETIME,
           PRIMARY KEY (CUSTOMER_ID, ORDER_ID)
    );


    Regards,

    Tarek Ghazali

     

     

    • Proposed as answer by Kalman Toth Friday, May 24, 2013 7:25 PM
    • Marked as answer by Allen Li - MSFT Monday, May 27, 2013 10:18 AM
    Tuesday, May 14, 2013 11:58 AM
  • Back up before you run over yourself.  It is not enough to say that you have tables.  A table has a purpose and is intended to model something in real life.  Without knowing what your schema is actually modelling, any comments you receive should be lookup upon as educated (if you are lucky) guesses.  

    It is highly likely that your posted tables are not properly normalized - but there are valid (though generally rare) reasons to denormalize.  The mapping approach introduces a level of complexity that is likely undesireable and is based on an illusion.  Though multiple persons may weigh the same on any given date, they do not, in reality, "share" that weight.  The mapping table is based on this concept of "sharing".  In this situation, the mapping table provides no advantage, does not reflect reality, and introduces complexity - in my opinion. 

    I will also add that your first proposal is incorrect with respect to the 2nd (weight) table.  That table does not need an artificial key.  It has a natural key - the column that is the foreign key to the first table along with the date.  You should be hesitant to simply assume that the primary key of every table must be an identity (or similar) column.

    • Proposed as answer by Kalman Toth Friday, May 24, 2013 7:25 PM
    Tuesday, May 14, 2013 12:54 PM
  • The second approach looks fine and satisfies the three NFs.

    As no info about the table data is provided , this is my assumption -The first table is a list of employees/customers/someone.  The second table contains the weight randomly taken on some days.

    Both the tables satisfies 1NF, there is no redundant data and data is scalar value.

    Also satisfy 2 NF, All the Non-key attributes should be dependent on key attributes.In table 1 , Name is dependent on PK1. In table2 the weight and date are dependent on PK2 and FK1(composite key).

    There is no transitive dependency(no need to verify in table1 as it is having single column Key), so they satisfy 3 NF as well.

    With 3rd approach we end up with storing duplicate date of PK2 in two tables.


    Thanks
    Sarat

    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    • Proposed as answer by Kalman Toth Friday, May 24, 2013 7:25 PM
    Tuesday, May 14, 2013 5:21 PM
  • So, the fact is, there is a lot of decent information in this thread, especially the need for normalization, but let's keep it simple. Figure out what "things" you want to model. I see, considering no requirements to speak of, two independent entities. 1. A person. 2. Their weight on a given day. I will use surrogates in my quick design because I don't know that you completely have fleshed out the attributes that make up a key, but I expect two, just like you have (with some naming changes to hopefully be an example :):

    Person
    =====
    PersonId int NOT NULL Primary Key,
    Name   varchar(30) NOT NULL Unique

    PersonWeight
    =========
    PersonWeightId int NOT NULL Primary Key,
    PersonId int NOT NULL REFERENCES Person(PersonId),
    WeightValue numeric(5,2) NOT NULL
    CaptureDate date NOT NULL,
    UNIQUE (PersonId, WeightValue, CaptureDate)

    Sample Data

    Person
    ====
    PersonId      Name
    -------------  -----------------------
    1                 JoeS

    PersonWeight
    =========
    PersonWeightId   PersonId WeightValue  CaptureDate
    ------------------- ---------- --------------- ---------------
    1                        1            193               2013-07-14
    2                        1            199               2013-07-16
    3                        1            200               2013-07-18

    Note that naming things really is a great way to get to know and understand them. Pk1, pk2 are meaning-free, but PersonId is clear that it represents a person (and that there are some attributes in the Person table that identify the person that can be substituted for the surrogate. In other words, where you see personId = 1, you can logically replace that with JoeS... something that would be more interesting if there was a child table to the PersonWeight table with 3 columns in the key, rather than just the one)


    Louis

    Without good requirements, my advice is only guesses. Please don't hold it against me if my answer answers my interpretation of your questions.

    • Proposed as answer by Kalman Toth Friday, May 24, 2013 7:25 PM
    Tuesday, May 14, 2013 6:06 PM
  • Samuel,

    >>>> "Normalize until it hurts, denormalize until it works

    Never heard that before. Thanks!

    Tuesday, May 14, 2013 11:49 PM
  • I learned a lot from everyone's answers! Especially how a simple-seeming question may have far more subtle than I realized. (And the points raised about normalization were all great.)

    edm2


    • Edited by edm2 Tuesday, May 14, 2013 11:54 PM
    Tuesday, May 14, 2013 11:53 PM
  • I really hate that saying because it lays blame on normalization. Most problems with normalization are related to over engineering the solution. ... My preferred way of thinking about it is "normalize until it works, and if it doesn't work, make sure you understood the requirements. Denormalize if you have tried every other possible solution and have consulted this forum and said, is there a better way"

    Even autocorrect hates the term denormalize :)


    Louis

    Without good requirements, my advice is only guesses. Please don't hold it against me if my answer answers my interpretation of your questions.

    Wednesday, May 15, 2013 5:05 PM