locked
Database Normalization best practices help RRS feed

  • Question

  • When creating a database with tables, and attempting to normalize a database I am curious about best practice or preferred methods of doing this; specifically in the example below.

     

    Let’s say you have a customers table, would you want to include the customer phone number as an attribute to the table, given the customer could have multiple phone numbers?  If not I can see 2 possible alternatives that could work and want to know which would be better from a design standpoint.  Would you want to create a customer phone number table with the customer primary key that also contains the phone number and type of phone number, or create a lookup table that links a customer table to a phone number table (using an intermediate table to handle the linking of phone numbers to a specific customer), as the phone number table would prevent duplicate phone numbers from being entered into the system and knowing that a person could change there phone number and an already existing phone number can then be reused (phone numbers get recycled all the time) if it already exists and re-linked to a specific customer using a lookup table.  Is there a best method for this and why?

    • Changed type a100011 Friday, September 10, 2010 9:32 PM Sounds more like a discussion.
    • Changed type a100011 Saturday, September 11, 2010 4:27 AM maybe there is the best answer
    Friday, September 10, 2010 9:25 PM

Answers

  • You need to maintain lookup table, one more table participate in JOIN .... See if that acceptable for you. Again , it is hard to suggest something for database design because for one client it could be perfect while for another client it could be disaster


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Monday, September 13, 2010 5:56 AM

All replies

  • I'd say option 1. Don't forget that there's a possibility of the same phone number applying to more than one customer (think of, say, husband and wife or father and daughter), or that the same phone number can do multiple duty (think of voice lines that can detect when the incoming call is a fax)

    --
    Doug Steele, Microsoft Access MVP
    http://www.AccessMVP.com/djsteele
    Co-author: "Access 2010 Solutions", published by Wiley
    (no e-mails, please!)
     

     

    When creating a database with tables, and attempting to normalize a database I am curious about best practice or preferred methods of doing this; specifically in the example below.

     

    Let�??s say you have a customers table, would you want to include the customer phone number as an attribute to the table, given the customer could have multiple phone numbers?  If not I can see 2 possible alternatives that could work and want to know which would be better from a design standpoint.  Would you want to create a customer phone number table with the customer primary key that also contains the phone number and type of phone number, or create a lookup table that links a customer table to a phone number table (using an intermediate table to handle the linking of phone numbers to a specific customer), as the phone number table would prevent duplicate phone numbers from being entered into the system and knowing that a person could change there phone number and an already existing phone number can then be reused (phone numbers get recycled all the time) if it already exists and re-linked to a specific customer using a lookup table.  Is there a best method for this and why?


    Doug Steele, Microsoft Access MVP http://www.AccessMVP.com/djsteele (no e-mails, please!)
    Friday, September 10, 2010 11:17 PM
  • So having duplicate phone numbers in a database is a better design than creating a linking table to link phone numbers to a specific table (in this case customers).  I guess I'm ok with that but lets say this is a real world project and the specifications have changed and now you are also keeping track of customers, employees, companies, and, some other entity that all contain phone numbers.  To make things even more real lets say some of the employees are also customers or have a company or another entity.  With option 1 you would have to create several more phone number tables which would produce even more duplicate phone numbers and when John Do's company changes phone numbers (it is a work at home company), you would have to change his phone number in multiple tables with option 1, while in option 2 you would add the new number to the system and re-link his company and customer information to the new phone number.  Wouldn't it be best to just have a single linking table with an extra atribute which would indicate where the phone number links to?  Example Linking table: Phone_Lookup(TableKeyID, PhoneKeyID, TableRef).  Now all phone numbers are in a single table, all referential linking is through a single table.  This would eliminate the need for a Company Phone, Employee Phone, and other entity phone number tables which are needed to link multiple phone numbers to a single customer, employee, company, or other entity.  Depending on the number of entities, you could reduce the number of tables from N * 2  to (N + 2).  Then if the specifications ever change again, all you need to do is add the new entity, and crate a new unique Phone_Lookkup.TableRef to link the phone number to the new table.  Not only would this prevent the need to redesign the database whenever an entity is added and to use already built functions to access and update the date, where in option 1 you would have to duplicate all this stuff for the new entity.  Kind of a cork but to me Option 2 sounds like a better solution.  In this scenario would option 1 still be ideal? and if not then why not build the relationship like option 2 from scratch and prevent the need to redesign the database when new entities are added?

    Thanks for your feedback and discussion as it seems there are multiple ways to do the same thing but which would be better given the new information?

    • Edited by a100011 Monday, September 13, 2010 3:19 PM Typo: changed N^2 to N * 2
    Saturday, September 11, 2010 1:14 AM
  • Logical design may be different from the physical. With logical design the option one is the best, however,  if you want to reduce number of JOIN  tables (denormalization especially in web app) you can add phone1,phone2 to the customer table as well


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Sunday, September 12, 2010 6:53 AM
  • You can easily have 2 customers with the same number or 1 customer with
    multiple phone numbers. What you should not have is duplicates of a row with
    CustomerID and PhoneNumber. In oeder to avoid that, use a compound unique
    index, either separately, or as the Primary Key.
    --
    Arvin Meyer, MCP, MVP
    http://www.datastrat.com
    http://www.accessmvp.com
    http://www.mvps.org/access
    Co-author: "Access Solutions", published by Wiley
     
     
    "a100011" <=?utf-8?B?YTEwMDAxMQ==?=> wrote in message
    news:c4ba0625-3cbb-41e0-82ea-06c2e03b170c...
    > So having duplicate phone numbers in a database is a better design than
    > creating a linking table to link phone numbers to a specific table (in
    > this case customers). I guess I'm ok with that but lets say this is a
    > real world project and the specifications have changed and now you are
    > also keeping track of customers, employees, companies, and, some other
    > entity that all contain phone numbers. To make things even more real lets
    > say some of the employees are also customers or have a company or another
    > entity. With option 1 you would have to create several more phone number
    > tables which would produce even more duplicate phone numbers and when John
    > Do's company changes phone numbers (it is a work at home company), you
    > would have to change his phone number in multiple tables with option 1,
    > while in option 2 you would add the new number to the system and re-link
    > his company and customer information to the new phone number. Wouldn't it
    > be best to just have a single linking table with an extra atribute which
    > would indicate where the phone number links to? Example Linking table:
    > Phone_Lookup(TableKeyID, PhoneKeyID, TableRef). Now all phone numbers are
    > in a single table, all referential linking is through a single table.
    > This would eliminate the need for a Company Phone, Employee Phone, and
    > other entity phone number tables which are needed to link multiple phone
    > numbers to a single customer, employee, company, or other entity.
    > Depending on the number of entities, you could reduce the number of tables
    > from N^2 to (N + 2). Then if the specifications ever change again, all
    > you need to do is add the new entity, and crate a new unique
    > Phone_Lookkup.TableRef to link the phone number to the new table. Not
    > only would this prevent the need to redesign the database whenever an
    > entity is added and to use already built functions to access and update
    > the date, where in option 1 you would have to duplicate all this stuff for
    > the new entity. Kind of a cork but to me Option 2 sounds like a better
    > solution. In this scenario would option 1 still be ideal? and if not then
    > why not build the relationship like option 2 from scratch and prevent the
    > need to redesign the database when new entities are added?
    >
    > Thanks for your feedback and discussion as it seems there are multiple
    > ways to do the same thing but which would be better given the new
    > information?
    >
     
     
     

    Arvin Meyer MCP, MVP MS-Access
    Sunday, September 12, 2010 6:26 PM
  • So if option 1 is best, what is wrong with option 2?  It reduces duplication as well as the number of tables.  I dont understand why option 1 would be logicly better than number two given the new entities that need to reference phone numbers.  why wouldnt option 2 be  better if multiple tables need to reference the phone number field (like a company table, employee, and customer).  Whats the advantage of miltiple referencial tables for each entity mentioned above when you can do the same thing with two tables instead of 3 or more needed with option 1.  Dosnt option 2 allow for dynamic growth if more tables are added that reference a phone number?  It seems logical to reduce the number of duplications (phone numbers) and referential tables.  why not merge multiple referential tables into a single lookup table?

     

    Thank u for the discussion and help.

    Monday, September 13, 2010 1:44 AM
  • You need to maintain lookup table, one more table participate in JOIN .... See if that acceptable for you. Again , it is hard to suggest something for database design because for one client it could be perfect while for another client it could be disaster


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Monday, September 13, 2010 5:55 AM
  • You need to maintain lookup table, one more table participate in JOIN .... See if that acceptable for you. Again , it is hard to suggest something for database design because for one client it could be perfect while for another client it could be disaster


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Monday, September 13, 2010 5:56 AM
  • Would you want to create a customer phone number table with the customer primary key that also contains the phone number and type of phone number,

     

    Answer: Yes, obviously this is best approach as per database designer point of view.

     

    Reason:

     

    1.      Only PK -- > FK relationship is enough to manage the data as per your requirement

    2.      One customer can have one or multiple Phone number which can be manage easily.

    3.      Even one phone number belongs to multiple customer also can be accommodate with in this design

     

    Ex: CustID, Phone Number, Type, Status...etc

     

    or create a lookup table that links a customer table to a phone number table (using an intermediate table to handle the linking of phone numbers to a specific customer), as the phone number table would prevent duplicate phone numbers from being entered into the system and knowing that a person could change there phone number and an already existing phone number can then be reused (phone numbers get recycled all the time) if it already exists and re-linked to a specific customer using a lookup table.  Is there a best method for this and why?

     

    Answer: This is not the best approach as per database designer point of view.

     

    Reason:

     

    1.      Only PK -- > FK relationship exists and Not Many-to-Many relationship present

    2.      The customer data is key data where phone number is dependent data to customer data

     

    The reason behind the look-up table is if there are two master/Key tables with many-to-many relationships then only we can go for a look-up table.

     

    If you clearly analysis your data, the phone number is not the key data, which means it depends on the customer specific, which is not predefined. It might change tomorrow and it may not work tomorrow.

     

    If I consider your design, just imagine how many phone number you will enter to your phone number table…Is all the number from your phone dictionary which seems to be impossible or you need to enter and make a link…why so.

     

    And more over you need only phone details for the customer those are present in your systems, so why you want a separate phone table and then links. You can easily achieve with 2 tables rather than 3 tables which does not make sense.

     

    You can only create a table “CustomerPhone” and manage your data easily.

    Please don’t think about data duplicate as it’s very minor in your case.

     

    Hope this help you to understand the approach to follow your design..

     

    thanks


    - MS (We are what we repeatedly do, Excellence, then, is not an act, but a habit.)
    Monday, September 13, 2010 9:44 AM
  • I see your point Uri Dimant as I think designing for database growth is an important aspect to database design.

     

    In SQL an int storage size is 4 bytes while a varchar (phone number is at least 12 bytes long (10 digit number + 2 bytes) http://msdn.microsoft.com/en-us/library/ms176089.aspx   Using a lookup table and not duplicating phone numbers would prevent the need to duplicate 12 bytes for a duplicate phone number in the system.  If you have a million phone numbers or more and assume duplications occur about X% of the time; then you would have a larger database.  In option 2 if two users have the same phone number then you would be duplicating an extra four bytes instead of twelve.

     In the example I gave there are multiple master tables (customers, companies, employees, and other entities) that use the phone numbers as a FK relationship.  You say I can achieve with two tables rather than 3, but I think in the example you would need at least 6 tables using the first method (PK -> FK):

    1.)    Customer -> Customer Phone #s

    2.)    Companies -> companies Phone #s

    3.)    Employees -> employees Phone #s

    4.)    Entity X -> Entity X Phone #s

    If other entities become needed and also depend on phone number then in case 1 you would need 2 new entities (Entity X and Entity X Phone).

     

    But in option 2 you would only need 5 tables as Phone Lookup and Phone Listing could be used for all entities:

    1.)    Customers -> Phone Lookup -> Phone Listing

    2.)    Companies -> Phone Lookup -> Phone Listing

    3.)    Employees -> Phone Lookup -> Phone Listing

    4.)    Entity X -> Phone Lookup -> Phone Listing

    If other entities become needed and also depend on phone number then in case 2 you would only need 1 new entity (Entity X), and could more easily be incorporated into the already developed “database design”.

     

    So in Option 1 the number of tables needed to maintain the same relation as option 2 is more not less (N * 2) instead of (N + 2) where N is the number of entities with a FK of phone number.  Database design option 2 is a smaller database with less tables and can handle growth more easily with less changes to the database design (only need to add 1 table instead of 2 when a new phone number entity exists), while option 1 is not as compact as it contains more tables.

     

    Although the size difference might be minimal in a small database, it could easily grow rapidly in a larger database, I think dealing with fewer tables makes the database design better, and if you use the same logic as option 2 for other entities in the database like (address, email addresses, state codes), you could reduce the db size and number of entities even more using the same logic as above, wouldn’t you?  From a text book point of view I can see how the relationship of PK -> FK needs to be maintained, but in some cases, it seems a lookup table would reduce the need for extra tables.  If you assume the database is small then I could see how option 1 would be just as efficient as option 2 but if the database starts accumulating lots of records, it seems there could be a tipping point as to where option 2 would be better. 

     

    At what point does duplicate data (IE: extra bytes being stored and tables being used) become important?  If this was a tracking system that monitored db changes than the number of entities would increase even further.  If a company grows; than at some point the impact will be apparent won’t it?  Aren’t you better off designing a database for growth in mind rather than ignoring the impact of duplicate data and table growth?  Shouldn’t db growth be considered in a database design?

     

    Monday, September 13, 2010 2:50 PM
  • So tossing my 3 cents in here (and I see this is a fairly old thread that was just bumped by a spam bot, but it was interesting!) Forget about storage to start with. If your business rules allow for > 1 phone number (or any attribute!) then a seperate table is necessary. Avoid the term "lookup" table here because you need a table that stores some business data. (It is really implementing an array of numbers, but in relational databases, we don't have arrays, we have tables and columns and they should mean/store one thing.)

    So the question of whether to store the data in a customerPhoneNumber table with a logical key of CustomerNumber, PhoneNumber, or to have a table that stores PhoneNumbers seperately as (PhoneId, PhoneNumber) and then have CustomerNumber,PhoneId really comes down to how you value the data. Do you have a concept of a PhoneNumber that stands alone from it being used as a customer? Particularly important is how you think of this question:

    Does a phone number change? Or does the identifier that I contact Customer X change? (And in some cases, will my system/user want to care that if Customer X changes from 555-1234 to 555-1235 that Customer Y who was using the same phone number change?

    Usually a phone number is not elevated to its own entity in *most* systems because it is just thought of as an array attribute of a customer. The fact that it gets used by > 1 customer is just a natural concequence of the real world we have to deal with. Like Uri stated, see what is acceptable for you, but also be careful that you design your databases to serve the user and application that will actually be used. You can model anything to a very deep level and then never end up using any feature (other than extra processing cycles!) So determine the concepts as YOUR system will use them and stop there. Then you have a properly normalized system.


    Louis

    Without good requirements, my advice is only guesses. Please don't hold it against me if my answer answers my interpretation of your questions.

    • Proposed as answer by Kalman Toth Friday, January 18, 2013 4:45 PM
    Thursday, January 17, 2013 10:11 PM