locked
Table design question RRS feed

  • Question

  • I need to store closely related set of text data.  Right now I have a Table with 30 Varchar(1000) columns.  A given row may have zero or more characters in any of the 30 columns, and in fact it is likely that an average row may only have data in 5-8 of these columns.  My other options are to create a parent table and 30 child tables each with a varchar(1000) column, or one parent table and a single child table with a varchar(1000) column and a column to differentiate the type of data stored in that row.

    My question are: 1. Which is more performant for data access, 2. Which is more efficient from the space standpoint. 3. Is any of the options just dumb?

    Thanks

    Jav

    Sunday, November 14, 2010 3:27 AM

Answers

  • The third option (Parent and a Child with a column indicating a type) sounds the best from the storage point of view and from the data access point of the view.

    I would use the third option.

    Also, in SQL 2005 and up you can use varchar/nvarchar(max) column to store unlimited information and in SQL 2008 you may want to consider using SPARSE columns. In the latest case it will make sense to use just one table, but declare these 30 columns as SPARSE.


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    • Marked as answer by Jav Sunday, November 14, 2010 4:33 AM
    Sunday, November 14, 2010 3:42 AM
    Answerer
  • Hi Jav,

    Your reply confirms what I already thought - your table design is not
    properly normalized.

    I don't know what the primary key of your table is, but let's assume
    that is is PamphletName. (Yes, I did get the part where you said it's
    not like a pamphlet, but you didn't tell me what else it is.

    Your current design is like:

    CREATE TABLE Pamphlets
        (PamphletID int NOT NULL,
         Chapter1 varchar(1000) NULL,
         Chapter2 varchar(1000) NULL,
    --  (...)
         Chapter30 varchar(1000) NULL,
    --   Other pamphlet-related columns
         PRIMARY KEY (PamphletID)
        );

    In a normalized design, you'd use a seperate table for the 30 chapter texts:

    CREATE TABLE Pamphlets
        (PamphletID int NOT NULL,
    --   Other pamphlet-related columns
         PRIMARY KEY (PamphletID)
        );
    CREATE TABLE Chapters
        (PamphletID int NOT NULL,
         ChapterNumber tinyint NOT NULL,
         ChapterText varchar(1000) NOT NULL,
         PRIMARY KEY (PamphletID, ChapterNumber),
         FOREIGN KEY (PamphletID) REFERENCES Pamphlets
                ON DELETE CASCADE ON UPDATE CASCADE,
         CHECK (ChapterNumber) BETWEEN 1 AND 30
        );

    For each pamphlet, there can be from 0 to 30 rows in the Chapters
    table, one for each chapter that has text.


    Hugo Kornelis, SQL Server MVP
    My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
    • Marked as answer by Jav Sunday, November 14, 2010 7:59 PM
    Sunday, November 14, 2010 4:17 PM

All replies

  • The third option (Parent and a Child with a column indicating a type) sounds the best from the storage point of view and from the data access point of the view.

    I would use the third option.

    Also, in SQL 2005 and up you can use varchar/nvarchar(max) column to store unlimited information and in SQL 2008 you may want to consider using SPARSE columns. In the latest case it will make sense to use just one table, but declare these 30 columns as SPARSE.


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    • Marked as answer by Jav Sunday, November 14, 2010 4:33 AM
    Sunday, November 14, 2010 3:42 AM
    Answerer
  • Hi Jav,

    I'll be frank with you - the idea of 30 varchar(1000) columns gives me
    the shivers. You don't say anything about what you are modeling in
    this way, so I can't judge whether you made a design error, or you
    have some extremely unusual requirement where this is indeed the
    correct design. But in most cases, a large number of columns with the
    same data type indicates a normalization error. For instance, the
    table MonthlySalesPerDept (_DeptID_, SalesJan, SalesFeb, SalesMar,
    SalesApr, SalesMay, SalesJun, SalesJul, SalesAug, SalesSep, SalesOct,
    SalesNov, SalesDec) should be normalized to a table with a two-column
    primary key: MonthlySalesPerDept (_DeptID, Month_, Sales)

    But again, without knowing your requirements, I have no way to assess
    whether or not you made that error.

    To answer your questions:

    1: Depends on your typical data access patterns. If you usually need
    to read just one of the columns, the multi-table design is better; if
    you read more than one, the single-column design is better.

    2: The difference, if any, will be small enough to be negligable.

    3: If what I write above is correct, then both options are incorrect,
    and the normalized design will be superior to both.


    Hugo Kornelis, SQL Server MVP
    My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
    Sunday, November 14, 2010 11:16 AM
  • Hugo,

    Thank you so much for your answer. 

    It is not a book or a pamphlet, but is like one.  There are 30 chapters, all empty to begin with.  The user needs all chapters to be available when she begins.  She can write in any or all chapters, up to 1000 characters each.  She needs the ability to move back and forth into any chapter at will. She may open and work with the 'book' for a day or two. When she is completely done, she says so and no further modifications can be made to any of the chapters.

    Just as I was typing this, a thought occurred.  May be I can use a "temporary and reusable" table with 30 columns for this.  But when the user says she is done, I can arrange all of the entered chapters in a logical order and place them in a single varchar(max) column in a permanent table.  The only problem is, how would I prevent it from being one huge paragraph with no separation between the chapters, and no differentiation of chapter headings from chapter contents.  Also, if those of the 30 columns that are null take only a byte or two, would not the end result be the same as far storage is concerned.

    Thanks

    Jav

    Sunday, November 14, 2010 3:23 PM
  • Hi Jav,

    Your reply confirms what I already thought - your table design is not
    properly normalized.

    I don't know what the primary key of your table is, but let's assume
    that is is PamphletName. (Yes, I did get the part where you said it's
    not like a pamphlet, but you didn't tell me what else it is.

    Your current design is like:

    CREATE TABLE Pamphlets
        (PamphletID int NOT NULL,
         Chapter1 varchar(1000) NULL,
         Chapter2 varchar(1000) NULL,
    --  (...)
         Chapter30 varchar(1000) NULL,
    --   Other pamphlet-related columns
         PRIMARY KEY (PamphletID)
        );

    In a normalized design, you'd use a seperate table for the 30 chapter texts:

    CREATE TABLE Pamphlets
        (PamphletID int NOT NULL,
    --   Other pamphlet-related columns
         PRIMARY KEY (PamphletID)
        );
    CREATE TABLE Chapters
        (PamphletID int NOT NULL,
         ChapterNumber tinyint NOT NULL,
         ChapterText varchar(1000) NOT NULL,
         PRIMARY KEY (PamphletID, ChapterNumber),
         FOREIGN KEY (PamphletID) REFERENCES Pamphlets
                ON DELETE CASCADE ON UPDATE CASCADE,
         CHECK (ChapterNumber) BETWEEN 1 AND 30
        );

    For each pamphlet, there can be from 0 to 30 rows in the Chapters
    table, one for each chapter that has text.


    Hugo Kornelis, SQL Server MVP
    My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
    • Marked as answer by Jav Sunday, November 14, 2010 7:59 PM
    Sunday, November 14, 2010 4:17 PM
  • Hugo,

    Thank you.  You are correct of course.  I needed someone to tell me that - the idea of a table with 30 varchar(1000) columns was giving me the shivers too.

    Jav

    Sunday, November 14, 2010 7:58 PM