locked
What would be a better design? And several questions need your thoughts and suggestion. thanks RRS feed

  • Question

  • I just got 2 existing tables. it is just for saving the tasks info which has many properties. To make it flexible for future changes, it was designed like this:

    • table task: it has 35 columns, with a ID as PK, then most are nvarchar, and among them 9 columns are nvarchar(1024), and the nvarchar1024 columns are not used now, it is just for future possible use;
    • table taskMoreInfo: it has 6 columns, ID as PK, TID as FK to the table task ID, then column property name, property value, insert date, modify date;

    Use:

    • when insert a task record, it will insert both tables: insert 1 in task, but insert 3 rows in table taskMoreInfo;
    • when query, actually mostly use just task table; that maybe why the designer designed the table like this.

    My questions are:

    1. After reading <SQL Server Internals>, i really doubt this is not a good design, and i have concern when the table is big. So what would be a better design?
    2. For this design, when it has millions of rows, how will be the perf issue? according to your experience, when (how many rows) will it meet perf issue?
    3. For the 9 nvarchar(1024) columns, i feel it is really a big overhead, and if there is data in it, the row overflow pages will be more than a page for a row, right? will it become a bottowneck?
    4. This task is just for English, so i wondered the data type varchar is enough, right? if use varchar instead of unicode 2 bytes, it will save half space, and the perf will be much better as IO reduced, right?

    Looking forward to knowing your thoughts,

    Thank you very much!


    Mark as Answer if it helps. This posting is provided "AS IS" with no warranties, confers no rights.

    • Moved by Dan GuzmanMVP, Editor Thursday, May 3, 2012 12:36 PM move to appropriate forum (From:SQL Server Database Engine)
    Thursday, May 3, 2012 12:21 PM

Answers

  • Hi Dannol,

    There is the single answer to all 4 of your questions - "It depends". We need to know more about system before we can suggest you design changes. I will try to give you some ideas to think about though..

    Row size and number of rows matters, of course, although it matters mainly when you scan the data. When you need to select just 1 (or a few) row based on ID or via non-clustered index, it has less affect to the system. Yes, accessing data stored in the row overflow pages would introduce extra IO but again, in case, if you are talking in the scope of just a few rows it would not be noticeable. But if you need to scan table (or large # of rows) that, obviously, affect the performance. Both, row size and # of rows, increase the number of data pages and as result SQL Server needs to perform more IO to do the scan. So at the end a lot of things depend on how optimized are your queries (well, there are a few other factors to consider but let's skip them for simplicity sake). 

    That design is kind of Entity-Attribute-Value pattern. I blogged about it some time ago (still owe the post with performance tests though). Speaking of 9 nvarchar columns you mentioned - I would suspect that original intention was to keep most common attributes in the main table to avoid extra queries from taskMoreInfo. While null nvarchar columns do not introduce a lot of overhead (except in variable-width array), those are not used either. I would consider to move them to taskMoreInfo unless application/logic depends on them. 

    Nvarchar vs. varchar is another "it depends" question. Well, of course, you save on the size of the row. On the other hand if there is even remote possibility of storing unicode data at the future, you would end up with a lot of work to refactor the system. Also, what data type are you using for the property value? Perhaps sql_variant would be better than nvarchar (there is the limitation for 8000 bytes as the max size). 


    Thank you!

    My blog: http://aboutsqlserver.com

    Thursday, May 3, 2012 3:57 PM
  • also, the PK of the taskMoreInfo is the taskID+taskProperty which is a nvarchar 256, is that a good practice for PK (clustered index)?

    It depends. Usually you want to make CI as narrow as possible in order to reduce number of levels in the index B-Tree and minimize the size of non-clustered index keys. You need to think about average length of taskProperty values (actual size would be 2*Len + 2). Of course, if you don't have NCI on that table, it would be the lesser issue.

    On the other hand, if I expected that table to have millions/billions of the rows I'd probably create additional table TaskProperyMetadata(tpmID smallint, Name,..) and replace taskProperty in taskMoreInfo to tpmID. It introduces more complexity but saves the space (think about IO, backups, etc). Data compression could be another option (also has pros and cons). 


    Thank you!

    My blog: http://aboutsqlserver.com

    Monday, May 7, 2012 3:10 PM

All replies

  • It will be better if you could re-post in


    Regards,
    Ahmed Ibrahim
    SQL Server Setup Team
    This posting is provided "AS IS" with no warranties, and confers no rights. Please remember to click "Mark as Answer" and "Vote as Helpful" on posts that help you.
    This can be beneficial to other community members reading the thread.

    Thursday, May 3, 2012 12:31 PM
  •  just got 2 existing tables. it is just for saving the tasks info which has many properties. To make it flexible for future changes, it was designed like this:
    • table task: it has 35 columns, with a ID as PK, then most are nvarchar, and among them 9 columns are nvarchar(1024), and the nvarchar1024 columns are not used now, it is just for future possible use;
    • table taskMoreInfo: it has 6 columns, ID as PK, TID as FK to the table task ID, then column property name, property value, insert date, modify date;

    Use:

    • when insert a task record, it will insert both tables: insert 1 in task, but insert 3 rows in table taskMoreInfo;
    • when query, actually mostly use just task table; that maybe why the designer designed the table like this.

    My questions are:

    1. After reading <SQL Server Internals>, i really doubt this is not a good design, and i have concern when the table is big. So what would be a better design?
    2. For this design, when it has millions of rows, how will be the perf issue? according to your experience, when (how many rows) will it meet perf issue?
    3. For the 9 nvarchar(1024) columns, i feel it is really a big overhead, and if there is data in it, the row overflow pages will be more than a page for a row, right? will it become a bottowneck?
    4. This task is just for English, so i wondered the data type varchar is enough, right? if use varchar instead of unicode 2 bytes, it will save half space, and the perf will be much better as IO reduced, right?

    Looking forward to knowing your thoughts,

    Thank you very much!



    Mark as Answer if it helps. This posting is provided "AS IS" with no warranties, confers no rights.

    Thursday, May 3, 2012 12:34 PM
  • I'll move this thread to the Database Design forum, which is more appropriate for your question.


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/

    Thursday, May 3, 2012 12:35 PM
    Answerer
  • Hi Dannol,

    There is the single answer to all 4 of your questions - "It depends". We need to know more about system before we can suggest you design changes. I will try to give you some ideas to think about though..

    Row size and number of rows matters, of course, although it matters mainly when you scan the data. When you need to select just 1 (or a few) row based on ID or via non-clustered index, it has less affect to the system. Yes, accessing data stored in the row overflow pages would introduce extra IO but again, in case, if you are talking in the scope of just a few rows it would not be noticeable. But if you need to scan table (or large # of rows) that, obviously, affect the performance. Both, row size and # of rows, increase the number of data pages and as result SQL Server needs to perform more IO to do the scan. So at the end a lot of things depend on how optimized are your queries (well, there are a few other factors to consider but let's skip them for simplicity sake). 

    That design is kind of Entity-Attribute-Value pattern. I blogged about it some time ago (still owe the post with performance tests though). Speaking of 9 nvarchar columns you mentioned - I would suspect that original intention was to keep most common attributes in the main table to avoid extra queries from taskMoreInfo. While null nvarchar columns do not introduce a lot of overhead (except in variable-width array), those are not used either. I would consider to move them to taskMoreInfo unless application/logic depends on them. 

    Nvarchar vs. varchar is another "it depends" question. Well, of course, you save on the size of the row. On the other hand if there is even remote possibility of storing unicode data at the future, you would end up with a lot of work to refactor the system. Also, what data type are you using for the property value? Perhaps sql_variant would be better than nvarchar (there is the limitation for 8000 bytes as the max size). 


    Thank you!

    My blog: http://aboutsqlserver.com

    Thursday, May 3, 2012 3:57 PM
  • thanks Dmitri. The task table is for managing the tasks coming in, and the other tools will call through web service methods to access this DB and update the task table, if like 1 day a task is not updated, a mail will be sent to the operator.

    also, the PK of the taskMoreInfo is the taskID+taskProperty which is a nvarchar 256, is that a good practice for PK (clustered index)?


    Mark as Answer if it helps. This posting is provided "AS IS" with no warranties, confers no rights.

    Monday, May 7, 2012 11:32 AM
  • also, the PK of the taskMoreInfo is the taskID+taskProperty which is a nvarchar 256, is that a good practice for PK (clustered index)?

    It depends. Usually you want to make CI as narrow as possible in order to reduce number of levels in the index B-Tree and minimize the size of non-clustered index keys. You need to think about average length of taskProperty values (actual size would be 2*Len + 2). Of course, if you don't have NCI on that table, it would be the lesser issue.

    On the other hand, if I expected that table to have millions/billions of the rows I'd probably create additional table TaskProperyMetadata(tpmID smallint, Name,..) and replace taskProperty in taskMoreInfo to tpmID. It introduces more complexity but saves the space (think about IO, backups, etc). Data compression could be another option (also has pros and cons). 


    Thank you!

    My blog: http://aboutsqlserver.com

    Monday, May 7, 2012 3:10 PM
  • Hi Dannol,

    My take on this is that the fact that you've got a bunch of unused fields of the same size suggests to me that there should actually be a separate table somewhere with a SINGLE field of that size which has a foreign key to the original table.  There's still a lot of developers who champion these kinds of "future flexible" designs because they haven't got the memo about modern DBMSes and the increased efficiency of joins compared with scanning lots of text in searches in a wide table.

    For the record, if I was going to have these fields set up to allow overflow from one record into the next, I'd just hive it off into a single varchar(max) field.  An even better alternative would be to declare the field type as XML, so people can at least mark up the contents of the field in a useful manner and apply an XSD to it during queries and/or do some XPath query against the contents.

    In short... normalization is good.  Denormalization for the sake of "future flexibility" when doing so pushes you past the maximum row-width in your table is bad logical design and even worse physical design.  Most data architects and data modellers I know would tear strips off anyone who came to them with that kind of design for a new product.


    Jeremy Huppatz
    Managing Consultant
    Solitaire Systems

    If you found this post helpful, please "Vote as Helpful". If it actually answered your question, please remember to "Mark as Answer". This will help other people find answers to their problems more quickly.

    Monday, May 7, 2012 3:33 PM