none
Incrementally compress a table online? RRS feed

  • Question

  • Anybody knows of a stragegy to compress a table gradually online with minimal available space.

    I am stuck with very little log, data, tempdb space available and requirements to keep the table available and high available (full recovery mode with always on in synchronous commit).

    What would be a potent strategy to avoid exploding the logs or putting the table offline.

    Right now, I am thinking

    1. rebuilding the table into multiple partitions online with sort in tempdb.
    2. Then rebuilding the table partitions individually with compression again online with sort in tempdb.

    But I think the initial rebuild(s) necessary to split the data on multiple filegroups will kill my hopes to have little log or tempdb space generated.

    Is there a way to rebuild the table online gradually while partitioning it or will I end up having one big operation which can't be split no matter what I do until I am finally partitioned?

    Alternative solutions are welcomed.

    Thursday, December 10, 2015 2:55 AM

Answers

  • One solution is to get disk space enough to perform the operation in a simple manner, or make it less urgent.

    Another is to pay for man hours to avoid buying more hardware.

    Now, this equation is likely to be dependent in which country you are. In countries with low labour cost the operation I will describe below could possibly be warranted. Where I am, it would be utter madness. Not the least when taking the risk in consideration.

    From what you say it seems that the table today is uncompressed and unpartitioned today.

    To avoid log explosion, you need to trickle rows from the existing table to a new table, which has the performance and compression characteristics you want. That is, you copy rows in reasonably sized batches, may 1000 rows at a time to the new table. Once all rows have been copied, you run a transaction where you drop the old table and rename the new one.

    This solution has a number of challenges:

    1) Keep track of which rows you have copied.
    2) How to deal with updates that occur while the operation is running.
    3) Do you have disk space to have both tables?

    The first is relatively simple, as you could keep a high-water mark in a small table. (Everything you do needs to be persisted, so that the operation can be interrupted and continue after a server restart.)

    The second could be handled by Change Tracking.

    The last is the most difficult one. One solution is to rename the existing table as well, and defined a view which is a UNION ALL of both tables. Then as you copy rows, you deleted from the the old table. You implement INSTEAD OF triggers to support INSERT, DELETE and UPDATE operations. (And this would void the first two points.)

    I think you realise that this is not a walk in the park, but something which requires careful testing. And testing which is non-trivial to achieve in a lab without a real workload. I will certainly keep you busy for a fortnight.

    • Marked as answer by Antoine F Saturday, December 12, 2015 12:37 AM
    Thursday, December 10, 2015 2:07 PM
  • There are foreign keys referecing to the table? I guess then you want to perform the same operations with the child tables as well?

    One option is to drop the FK during the transition phase and implement RI through triggers. Or simply pray and hope that data stays good during the transition and clean up any mess once you are done.

    Most of the time when you do partitioning, you want the partitions to remain the same, and you only add new empty partitions and drop old ones by means of switching. Redefining the partitioning in a way that requires data movement will make this trickling affair a recurring event. Not recommendable.

    • Marked as answer by Antoine F Saturday, December 12, 2015 12:37 AM
    Friday, December 11, 2015 11:04 PM

All replies

  • What do you meany by 'rebuild the table online gradually' ?

    is space only the reason for compression ?

    

    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it

    My Wiki Articles

    MVP


    Thursday, December 10, 2015 7:18 AM
    Moderator
  • One solution is to get disk space enough to perform the operation in a simple manner, or make it less urgent.

    Another is to pay for man hours to avoid buying more hardware.

    Now, this equation is likely to be dependent in which country you are. In countries with low labour cost the operation I will describe below could possibly be warranted. Where I am, it would be utter madness. Not the least when taking the risk in consideration.

    From what you say it seems that the table today is uncompressed and unpartitioned today.

    To avoid log explosion, you need to trickle rows from the existing table to a new table, which has the performance and compression characteristics you want. That is, you copy rows in reasonably sized batches, may 1000 rows at a time to the new table. Once all rows have been copied, you run a transaction where you drop the old table and rename the new one.

    This solution has a number of challenges:

    1) Keep track of which rows you have copied.
    2) How to deal with updates that occur while the operation is running.
    3) Do you have disk space to have both tables?

    The first is relatively simple, as you could keep a high-water mark in a small table. (Everything you do needs to be persisted, so that the operation can be interrupted and continue after a server restart.)

    The second could be handled by Change Tracking.

    The last is the most difficult one. One solution is to rename the existing table as well, and defined a view which is a UNION ALL of both tables. Then as you copy rows, you deleted from the the old table. You implement INSTEAD OF triggers to support INSERT, DELETE and UPDATE operations. (And this would void the first two points.)

    I think you realise that this is not a walk in the park, but something which requires careful testing. And testing which is non-trivial to achieve in a lab without a real workload. I will certainly keep you busy for a fortnight.

    • Marked as answer by Antoine F Saturday, December 12, 2015 12:37 AM
    Thursday, December 10, 2015 2:07 PM
  • It is quite a challenge and will keep me busy I have no doubt.

    The view would allow to present the tables as one while splitting the data.  it would require to remove constraints however since a foreign key cannot refer to two different tables at once.  Unless of course I persists the view which would increase my pain by exploding data space requirement as well as possibly impacting the write performances.

    With regard to performance, the table is subject to bulk inserts.  This is an additional reason I consider parittioning.  I would keep a partition uncompressed to avoid degrading insertions performance.

    Space is my number one issue.  I'd like to reduce the space requirement growth by half.  I will of course be held responsible if I chocke the application so I would not go as far as saying this is the only issue...

    What I believe will be my obstacle will be when redefining my partition scheme.  Either I alter my parittion function and scheme and end up with major locking or I rebuild my table online occasionally which requires major space space and log.  I also doubt I can rebuild only one partition while applying a new partition scheme...

    I am still at the planning phase, elaborating a strategy.  Any good reads would also be appreciated.


    • Edited by Antoine F Friday, December 11, 2015 1:35 AM
    Friday, December 11, 2015 1:33 AM
  • There are foreign keys referecing to the table? I guess then you want to perform the same operations with the child tables as well?

    One option is to drop the FK during the transition phase and implement RI through triggers. Or simply pray and hope that data stays good during the transition and clean up any mess once you are done.

    Most of the time when you do partitioning, you want the partitions to remain the same, and you only add new empty partitions and drop old ones by means of switching. Redefining the partitioning in a way that requires data movement will make this trickling affair a recurring event. Not recommendable.

    • Marked as answer by Antoine F Saturday, December 12, 2015 12:37 AM
    Friday, December 11, 2015 11:04 PM
  • Awesome, I the trigger is an great idea.  No need to persist the view, just handle exceptions in the trigger on the view.

    My issue just got simplified massively.  I will have the possibility of moving part of the data to a different SAN where I won't be limitted by space as much.  I will just partition my tables ahead of time, Bulk insert in newer partitions while compressing in older ones.

    This will also allow me to implement technet's best practice of planning an empty partition ahead of time to avoid data movement within partition and the heavy locking.

    All I will have to do is schedule creation of additional partitions ahead of time and compression of older partitions on a regular basis.

    Thanks for the help.

    p.s. I love your blog. Your article on array and list helped me progress tremendously.



    • Edited by Antoine F Saturday, December 12, 2015 12:39 AM
    Saturday, December 12, 2015 12:37 AM