locked
Modelisation recommendations RRS feed

  • Question

  • Hi All !

    I need some advice, like best practices for a data base modelisation.

    I am on a project, i have to improve a web application with some new functionalities, so data base modifications...

    Context :

    - Max record on few tables : 50 000 rows of 10 columns (it's not big), other tables are very few filled.

    - The relational Databse will be SQL Server 2008 R2

    First, a basic question :

    Table A1, Table A2 and Table B, i have many to many relationship between Table A1 and Table B so i have the relation table A1_B, but now i need to add the same relation between Table A2 and B, is it better to reuse the A1_B and rename A_B and store A1 and A2 ids in the same column and label with another column like A_source or is it better to have 2 distinct tables A1_B and A2_B ? I think it's a common question, i don't want to impact the A1_B table (this table is used for something else) so i prefer create A2_B table, but if there is a best practice on it, i will follow it and i love follow best practices.;)

    My second question is about some boolean columns, i discovered that it is handled by a nvarchar(1) (crazy boyz...), i need to add some new boolean and i wanted to follow a same structure modelisation, is there an performance impact on let nvarchar(1) for boolean ?

    My third question : ids are nvarchar(50) so is there inprovments if i add indexes on these columns ?

    Thx for your advices...


    Thursday, August 30, 2012 12:53 PM

Answers

  • Re Question 1.  From a purely relational point of view, the proper approach is to use a separate table.  Keep in mind that database design is part art and part science.  It is difficult to say that there are best practices in design since each system is unique.  The best approach, IMO, is to properly normalize your tables first and then adjust the design (when needed) based on the requirements of the system.

    Re Question 2. You can choose to handle boolean values in a number of different ways.  If you want to use a character-based value, then I suggest you use char(1) since there is no need for variable size and there seems to be no reason to support any unicode character.   There is a slight performance impact on using variable size columns.  Perhaps the more important issue is to be consistent since that will minimize errors, mis-understandings, and potentially maintenance efforts.  If the database already has boolean values defined in a particular manner, then the best/safest approach is to continue using the same usage. 

    Re Question 3.  Indexing is a task best left for later and should be designed to optimize performance.  Given the size of your database, it is unlikely that indexing will have any measurable impact.  Your use of the phrase "ids are nvarchar(50)" concerns me from a design point of view.  One of the first steps to table design is to identify the natural key (or keys) and select one as the primary key.  We may choose to implement an artificial key - but that is a decision that should be based on need and not based on some rote design standard (i.e., every table will have an identity column as PK).   Again you have a variable-sized column which raises the same issues as I mentioned earlier.  You can find a lot of information on selecting a primary key, as well as selecting the columns for a clustered index, by searching.  Do not equate the primary key with the clustered index - the primary key is the clustered index in sqlserver simply because it is the default action if you do not specify otherwise.  Choose the clustered index carefully as you only have one per table. 

    One last comment.  You ask about "improvements".  If you do not measure or specify, then improvement is in the eye of the beholder.  Therefore, what specifically do you want to "improve" with respect to your "ids"?  That is a rhetorical question at the moment - one that you need to think about carefully.  With a working, stable system it may be better (certainly less risky) to minimize the natural tendency to "improve" things too much.  This is true even if the database is poorly designed, inelegant, inconsistent, etc.

    • Marked as answer by Maggie Luo Thursday, September 6, 2012 3:23 PM
    Thursday, August 30, 2012 2:05 PM

All replies

  • Re Question 1.  From a purely relational point of view, the proper approach is to use a separate table.  Keep in mind that database design is part art and part science.  It is difficult to say that there are best practices in design since each system is unique.  The best approach, IMO, is to properly normalize your tables first and then adjust the design (when needed) based on the requirements of the system.

    Re Question 2. You can choose to handle boolean values in a number of different ways.  If you want to use a character-based value, then I suggest you use char(1) since there is no need for variable size and there seems to be no reason to support any unicode character.   There is a slight performance impact on using variable size columns.  Perhaps the more important issue is to be consistent since that will minimize errors, mis-understandings, and potentially maintenance efforts.  If the database already has boolean values defined in a particular manner, then the best/safest approach is to continue using the same usage. 

    Re Question 3.  Indexing is a task best left for later and should be designed to optimize performance.  Given the size of your database, it is unlikely that indexing will have any measurable impact.  Your use of the phrase "ids are nvarchar(50)" concerns me from a design point of view.  One of the first steps to table design is to identify the natural key (or keys) and select one as the primary key.  We may choose to implement an artificial key - but that is a decision that should be based on need and not based on some rote design standard (i.e., every table will have an identity column as PK).   Again you have a variable-sized column which raises the same issues as I mentioned earlier.  You can find a lot of information on selecting a primary key, as well as selecting the columns for a clustered index, by searching.  Do not equate the primary key with the clustered index - the primary key is the clustered index in sqlserver simply because it is the default action if you do not specify otherwise.  Choose the clustered index carefully as you only have one per table. 

    One last comment.  You ask about "improvements".  If you do not measure or specify, then improvement is in the eye of the beholder.  Therefore, what specifically do you want to "improve" with respect to your "ids"?  That is a rhetorical question at the moment - one that you need to think about carefully.  With a working, stable system it may be better (certainly less risky) to minimize the natural tendency to "improve" things too much.  This is true even if the database is poorly designed, inelegant, inconsistent, etc.

    • Marked as answer by Maggie Luo Thursday, September 6, 2012 3:23 PM
    Thursday, August 30, 2012 2:05 PM
  •  2)

    You can use BIT data type for boolean  but make sure that columns does not allow NULLs (if it is acceptable)

    3) 

    Try keep index column as small as possible (smallest data type). What is a purpose to have ID as NVARCHAR(50)????


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Blog: Large scale of database and data cleansing
    MS SQL Consultants: Improves MS SQL Database Performance

    Tuesday, September 4, 2012 5:20 AM
  • Ok, thx Uri
    Thursday, September 13, 2012 8:23 AM
  • I agree with you Scott, tanks.
    Thursday, September 13, 2012 8:24 AM