locked
Need help for the physical design of a big table with rows populated in 3 stages RRS feed

  • Question

  • Greetings,

    I am designing a special SQL Server table with 1M rows added daily. The table is populated in 3 phases. That means the entity lifecycle has 4 states:

    • State 1: Inserts a new row and populates columns 1-15. The rest of columns stay NULL.

    • State 2: Updates (populates) columns 16-25

    • State 3: Updates (populates) columns 26-40

    • State 4: The row can be processed for reporting.

    Above requirements imposes the following inefficiencies:

    • All of the columns 16-40 must be nullable since they are populated after each row is created.
    • Since there are at least 2 sets of NULL columns are pupated (update) after each row is created, there will be many fragmentations – I assume right?

    I was thinking to break this entity to 3 tables, however, if I do that then I need to keep join the 3 tables during the reporting. Is there any technique or pattern that help me to make this table more efficient?

    Any help would be appreciated.

    Tuesday, July 5, 2016 6:16 PM

Answers

  • If you are doing this as 3 independent processes, I would recommend 3 separate tables and a view to join them together.

    Tuesday, July 5, 2016 6:36 PM
    Answerer
  • Yes, you will get page splits and/or fragmentation, if many of the fields are like varchars that only take up space when non-null.  If they are all fixed-size like int or char, then you may not get splits or fragmentation.

    There's also more logging when all the fields are in rows like that. 

    Are any of the fields large, varchar(max) or binary?  That is also a factor.

    Joining isn't too bad if the tables all are 1:1.

    I think the various answers are pretty close, chose whichever you're most comfortable with.

    Josh

    Tuesday, July 5, 2016 6:42 PM
  • >Are you referring to vertical or horizontal partitioning?

    Partitioned Tables And Indexes might separate the newly loaded data from the older data.

    >Do you recommend Clustered Column store Indexes for non-warehouse and OLTP workloads?

    In SQL 2016 Clustered Columnstore indexes can have primary key and secondary indexes, making them more broadly useful.

    >By rebuilding a table, do you mean copy the table data into a new table, delete the old one and rename the new table?

    No.  Just rebuilding the clustered index (or perhaps one partition of it).  This can remove any fragmentation introduced in the data load, if you have the time to do it.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Wednesday, July 6, 2016 4:33 AM

All replies

  • If you are doing this as 3 independent processes, I would recommend 3 separate tables and a view to join them together.

    Tuesday, July 5, 2016 6:36 PM
    Answerer
  • Yes, you will get page splits and/or fragmentation, if many of the fields are like varchars that only take up space when non-null.  If they are all fixed-size like int or char, then you may not get splits or fragmentation.

    There's also more logging when all the fields are in rows like that. 

    Are any of the fields large, varchar(max) or binary?  That is also a factor.

    Joining isn't too bad if the tables all are 1:1.

    I think the various answers are pretty close, chose whichever you're most comfortable with.

    Josh

    Tuesday, July 5, 2016 6:42 PM
  • Hi Tom,

    The table splits could cost during read. With splinting this table into three table, there will be at least 3 page reads.

    Also most of the columns in the table are fixed size. 

    Do you still think splinting the table would make this entity more efficient?

    Thank  you for help.

    Tuesday, July 5, 2016 8:38 PM
  • In general, try rather hard not to twist the logical database design to optimize performance. 

    What version and edition of SQL Server are you using?

    Can you use Partitioning?  Clustered Columnstore Indexes?

    How much time do you have to load data?  Can you rebuild the table/partition after load?

    David


    David http://blogs.msdn.com/b/dbrowne/


    Tuesday, July 5, 2016 8:58 PM
  • Hi David,

    Thank you for help.

    I am using SQL 2014. There will be ~500,000 rows added to this table daily, with the row life-cycle i explained in my post. 

    Are you referring to vertical or horizontal partitioning?

    Do you recommend Clustered Column store Indexes for non-warehouse and OLTP workloads?

    By rebuilding a table, do you mean copy the table data into a new table, delete the old one and rename the new table?

    Thank you,

     

    Tuesday, July 5, 2016 10:57 PM
  • >Are you referring to vertical or horizontal partitioning?

    Partitioned Tables And Indexes might separate the newly loaded data from the older data.

    >Do you recommend Clustered Column store Indexes for non-warehouse and OLTP workloads?

    In SQL 2016 Clustered Columnstore indexes can have primary key and secondary indexes, making them more broadly useful.

    >By rebuilding a table, do you mean copy the table data into a new table, delete the old one and rename the new table?

    No.  Just rebuilding the clustered index (or perhaps one partition of it).  This can remove any fragmentation introduced in the data load, if you have the time to do it.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Wednesday, July 6, 2016 4:33 AM