none
I need to change a INT to a BIGINT dataype in a large table with little downtime.

    Question

  • I have a column that is a INT data type that needs to be changed to a BIGINT data type.  The column is a identity column that is reaching upper limits and part of the PK.  The table has 1.5 billion records and very little downtime.  I was thinking of replicating the table to reduce the needed downtime to change the data type.  I have a two hour window for maintenance each month to make changes.

    I need help or suggestion on how to accomplish this in a two hour window. 

    • Moved by Tom PhillipsModerator Monday, November 23, 2009 5:43 PM Possibly better answer from TSQL forum (From:SQL Server Database Engine)
    Monday, November 23, 2009 2:00 PM

Answers

  • When we are moving entire data centers, we often have a very short 'switchover' window. Perhaps using a similar process could be used for this problem.

    1. Create a new, duplicate table, with the bigint identity column.
    2. Create a duplicate 'holding table' for changes that will occur during the change over. Add 'Action' column.
    3. Create INSERT/UPDATE/DELETE triggers for the existing table to copy all changes (including action) to the holding table.
    4. Copy the data from the existing table to the new table (filter out the holding table data). Use reasonable batches to reduce TLog pressure. Make frequent TLog backups to reduce TLog size pressure.
    5. Create the appropriate indexes on the new table
    6. Using Merge, move 'holding table' data to the new table, deleting the moved set from the holding table.
    7. Force Users out of the db, lock the db
    8. Drop all foreign keys referencing the old table
    9. Drop (or rename) the old table
    10. Rename the new table to the old table name
    11. Create the foreign keys referencing the new table
    12. Make the final move from the holding table to the 'new' and now renamed table
    13. Open the db to users.
    14. Backup the db

    In this scenario, it really doens't matter how long it takes to do 99% of the task. What you are concerned about is minimizing user impact.

    This 'should' be accomplished with less than 15 minutes total downtime. By carefully scripting all of the steps, I have accomplished similar actions on much larger databases with less than 5 minutes downtime.
    "You cannot do a kindness too soon because you never know how soon it will be too late." -Ralph Waldo Emerson
    • Marked as answer by TheBrick Monday, November 30, 2009 4:50 PM
    Monday, November 30, 2009 4:35 PM
  • I also had the same issue this morning. As a quick fix we went with

    DBCC CHECKIDENT('MySchema.MyTableName', RESEED, -2147483648)
    For a long run, BIGINT looks promising because we get around 50 million records per day.
    

     

    Thursday, June 02, 2011 4:25 PM

All replies

  • http://sql-server-performance.com/Community/forums/t/25772.aspx

    HTH
    Vinay

    Vinay Thakur http://vinay-thakur.spaces.live.com/ http://twitter.com/ThakurVinay
    Monday, November 23, 2009 4:55 PM
  • While providing this link does provide good information, this post doesn't really qualify as an attemtp to help the user. It would have been useful to provide some explanation, and then perhaps suggest the link.

    Since most questions have been asked and answered somewhere, this would become an impersonal and lifeless forum if we all followed this model.
    You may be only one person in the world, but you may also be the world to one person.
    Saturday, November 28, 2009 3:58 AM
  • INT can take 4 billion different unique values.
    Why can't you just issue a

    DBCC CHECKIDENT('MySchema.MyTableName', RESEED, -2147483648)
    

    To have the table "start" over at negative 2.1 billion. That will give you 2.1 billion more records. And has NO downtime!
    And when value is increasing and reaching near zero, issue a

    DBCC CHECKIDENT('MySchema.MyTableName', RESEED,  1550000000)
    

    To continue at your old value (and have 600 million more records). How long time has it taken to fill up your present 1.5 billion records?

    Do remember that if the identity column also is clustered, you will get performance issues! But if you have a clustered index on another column(s), you will not notice that the identity value is starting over from negative 2.1 billion.

    //Peter

    Saturday, November 28, 2009 8:15 PM
  • To have the table "start" over at negative 2.1 billion. That will give you 2.1 billion more records. And has NO downtime!



    I would not recommend that. The INT identity(1,1) surrogate primary key is commonly accepted in the positive range. You may get unusual issues extending to negative range. Even business issues: some people may think that id -1000  same as id 1000. When you run out of the negative range, you have to go BIGINT anyhow.

    I vote for BIGINT.

    You can minimize down time by testing out the fastest conversion in advance.



    Kalman Toth SQL SERVER 2012 & BI TRAINING
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012


    Sunday, November 29, 2009 1:02 AM
  • I am sorry, but I don't see how my response was inaccurate nor disrespectful, to the extent that you thought it was a "hijack"?

    I told OP of another method that he not might have considered. Maybe there is no need to change to BIGINT at all? If all other approaches has been explored, that's fine and change the column to BIGINT.
    Changing to BIGINT column, IN my opinion, has more impact than DBCC approach. Changing the column to BIGINT need the architect and developers fully test the application(s) again, to see that all changes are good. And that e is AFTER the changing to BIGINT has been done. Or maybe has a test environment where he has restored a backup, changed the column type there, and also made the necessary modifications to the application(s)?
    We don't know because OP hasn't told us.

    The previous answer is split here for continuation
    http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/3b7215ae-e65b-4e7d-a262-75cc3fe5ad1a
    • Edited by SwePesoMVP Sunday, November 29, 2009 6:53 AM
    Sunday, November 29, 2009 6:51 AM
  • Peso,

    Did you ever try it? Starting out at: -2147483648?

    Anyone has any experience with
    int identity( -2147483648,1) Primary Key
    ?



    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Sunday, November 29, 2009 8:08 AM
  • Yes, I have. I wouldn't be responsible if I gave advice I have not testest myself, would I?
    Sunday, November 29, 2009 8:16 AM
  • I need help or suggestion on how to accomplish this in a two hour window. 


    Build a new table like:

    SELECT ID = CONVERT(BIGINT, ID), .... INTO newTableAlpha FROM TableAlpha
    ( You have to start this step before the two hour maintenance window.. like 10 hours before..depending on your test run results)

    Apply indexes/constraints

    ( two hour window starts)

    Rename TableAlpha to zzzTableAlpha

    Rename newTableAlpha to TableAlpha

    Compare TableAlpha to zzzTableAlpha

    Apply updates/inserts to TableAlpha based on the delta


    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Sunday, November 29, 2009 8:22 AM
  • Yes, I have. I wouldn't be responsible if I gave advice I have not testest myself, would I?

    That is still different though from the current case whereby the positive int range is already allocated.

    In your experience, any pros & cons?



    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Sunday, November 29, 2009 8:37 AM
  • Only good experiences, "pros".
    Sunday, November 29, 2009 9:02 AM
  • Build a new table like:

    SELECT ID = CONVERT(BIGINT, ID), .... INTO newTableAlpha FROM TableAlpha
    ( You have to start this step before the two hour maintenance window.. like 10 hours before..depending on your test run results)


    I am interested to find out which table locks are applied with the suggestion above. A "SELECT * INTO" does put a shared lock, or a table lock?
    And if the suggestion puts a table lock, how is business affected?

    Sunday, November 29, 2009 9:04 AM
  • Peso,

    I am just recommending an approach. You can build the stage table ( newTableAlpha ) from a different database (last night's backup restore for example).

    The point is, once it is built, you just have to compare it to live table to apply the delta insert/deletes/updates. Hopefully 2 hour maintenance window is sufficient for that, but that is only a hope.

    If the HW platform is very fast, it may even be possible to do the entire conversion within the 2 hour window.

    TheBrick,

    Can you take measurements in a copy of the database? How long SELECT INTO takes for example?  Can you share the results with us?

    Thanks.


    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Sunday, November 29, 2009 12:01 PM
  • The select into took over 10 hours to complete for the one table.  I don't have a good way to capture the delta, because the table was not designed with the appropriate change columns like data time.  The other issue is data can be permanently deleted from this table.

    I am also concerned about the overhead and locking on this table during the copy.

    Do you guys have a good way to track delta?

    Monday, November 30, 2009 3:02 PM
  • The destination table for the copy had no PK, NC Indexes, or FK's.  I did it this way for speed.  I don't believe changing the recovery model will make a differences, becuase select *** into **** is treated as one transaction, correct.
    Monday, November 30, 2009 3:06 PM
  • The destination table for the copy had no PK, NC Indexes, or FK's.  I did it this way for speed.  I don't believe changing the recovery model will make a differences, becuase select *** into **** is treated as one transaction, correct.

    Yes, and the source table(s) are locked with table lock too, as you noticed.
    The delta would need change tracking enabled on SQL Server 2008. If you don't have 2008 with change tracking enabled, tracking the delta would also take 10 hours because of updates/deletes and inserts. I think this approach is a dead end for you, if you want this to happen within the 2 hours maintenance window.

    Have you considered the other approach? It takes about 0.01 seconds to complete.

    Which version and edition of SQL Server are you using?
    Monday, November 30, 2009 3:23 PM
  • There is another way and that is to implement a trigger to track the delta.

    I have used this only once in the past. 

    1) "SELECT TOP 0 * INTO DupTable FROM SourceTable"
    2) Edit the DupTable and a) Remove identity property and b) change column to bigint.
    3) Create a delete trigger to remove matching record in DupTable
    4) Create an insert trigger to insert new records into new DupTable
    5) Create an update trigger to update matching records in DupTable
    6) Issue 

    DECLARE	@ID INT
    
    SELECT	@ID = 0
    
    WHILE @ID <= 1550000000
    	BEGIN
    		INSERT	DupTable
    		SELECT	*
    		FROM	SourceTable
    		WHERE	IDCOL >= @ID 
    			AND IDCOL < @ID + 1000000
    			
    		SET	@IDCOL = @IDCOL + 1000000
    	END
    

    to "transfer" 1 million records at a time from the old SourceTable to the new DupTable with minimal logging. This must be done "off-hours" so that the triggers doesn't hit a records which isn't transfered yet.

    7) Change all code to handle BIGINT
    8) Here is the tricky part:
    9) Drop all indexes and all constraints against the SourceTable.
    10) Rename SourceTable to OldTable
    11) Rename DupTable to SourceTable
    12) Apply all constraints and indexes in the table.

    But still, I don't believe it is possible to make this happen (the tricky part) within the 2 hour maintenance window.
    I have dealt with VLDB for many years now, and that's why I suggested the alternative approach because I know things takes time, and it is near impossible to accomplish this in the 2 hour maintenance window. I believe any experienced DBA will agree.

    One last resort might be this:
    BCP all records out (with a batchsize of 1 million records), truncate the source table, drop the contraints and indexes over the ID column and change the column to bigint. Now, BCP in all records in again with identity insert enabled. Last, recreate the indexes and constraints used over id column.

    Try to BCP all records out and report back here the time it took to export them. If it takes less than 30 minutes, you might have a chance with this approach, because BCP in will take some more time. And recreating indexes and constraints will also take some time.

    Monday, November 30, 2009 3:52 PM
  •  I think this approach is a dead end for you, if you want this to happen within the 2 hours maintenance window.

    Have you considered the other approach? It takes about 0.01 seconds to complete.

    Which version and edition of SQL Server are you using?

    I agree with Peso, let's forget about the 2 hour maintenance window given the 10 hours SELECT INTO.

    You can go the negative range INT IDENTITY as Peso suggested if that is a good fit.

    If you go BIGINT, here are some thoughts:

    1. On SAT 1AM you backup production db and restore  to a copy of production db
    2. In the next 30 hours, you build StagingTableAlpha in production using the production db copy as source
    3. Rename TableAlpha to zzzTableAlpha
    4. Rename StagingTableAlpha to TableAlpha
    5. During the next 24 hours you apply the deltas (which happened since the backup) in small batches on a business priority basis
    6. By Monday morning the new table  should be synched and fully functional


    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Monday, November 30, 2009 3:52 PM
  • If you go BIGINT, here are some thoughts:

    1. On SAT 1AM you backup production db and restore  to a copy of production db
    2. In the next 30 hours, you build StagingTableAlpha in production using the production db copy as source
    3. Rename TableAlpha to zzzTableAlpha
    4. Rename StagingTableAlpha to TableAlpha
    5. During the next 24 hours you apply the deltas (which happened since the backup) on a business priority basis
    6. By Monday morning the new table  should be synched and fully functional


    Kalman Toth
    I assume you also mean to include (for completeness)
    2a) No constraints nor indexes are present in StagingTableAlpha. You can't change a datatype to BIGINT otherwise.
    2b) Change datatype to BIGINT
    4a) Reapply constraints
    4b) Reapply indexes (which can be be done online if using Enterprise Edition)

    The hard part with this approach is to find out what is the delta. And without change tracking that would take another 10 hours.
    Meanwhile, users will have access to data which isn't clean, ie records are still present which should have been deleted.
    Records which has been updated for any reason, any column and any new value.
    Monday, November 30, 2009 4:06 PM

  • The only reason I haven't considered the other approach is it takes two maintenance windows.  One 2 hour window to do the transfer and another window to swap the tables, change the column to identity and create FK's.  I am also concerned about the overhead of the triggers on the source table.

    I do like your approach and I am going to do a test to gather data points of time needed to complete each.  It maybe worth to overhear.
    Monday, November 30, 2009 4:31 PM
  • When we are moving entire data centers, we often have a very short 'switchover' window. Perhaps using a similar process could be used for this problem.

    1. Create a new, duplicate table, with the bigint identity column.
    2. Create a duplicate 'holding table' for changes that will occur during the change over. Add 'Action' column.
    3. Create INSERT/UPDATE/DELETE triggers for the existing table to copy all changes (including action) to the holding table.
    4. Copy the data from the existing table to the new table (filter out the holding table data). Use reasonable batches to reduce TLog pressure. Make frequent TLog backups to reduce TLog size pressure.
    5. Create the appropriate indexes on the new table
    6. Using Merge, move 'holding table' data to the new table, deleting the moved set from the holding table.
    7. Force Users out of the db, lock the db
    8. Drop all foreign keys referencing the old table
    9. Drop (or rename) the old table
    10. Rename the new table to the old table name
    11. Create the foreign keys referencing the new table
    12. Make the final move from the holding table to the 'new' and now renamed table
    13. Open the db to users.
    14. Backup the db

    In this scenario, it really doens't matter how long it takes to do 99% of the task. What you are concerned about is minimizing user impact.

    This 'should' be accomplished with less than 15 minutes total downtime. By carefully scripting all of the steps, I have accomplished similar actions on much larger databases with less than 5 minutes downtime.
    "You cannot do a kindness too soon because you never know how soon it will be too late." -Ralph Waldo Emerson
    • Marked as answer by TheBrick Monday, November 30, 2009 4:50 PM
    Monday, November 30, 2009 4:35 PM
  • I like this approach and will give it a try in test to see how it goes.
    Monday, November 30, 2009 4:51 PM
  • Be sure to set the IDENTITY_INSERT ON option for the new table, and then to reset the new table max IDENTITY value using DBCC CHECKIDENT RESEED after the last move from the holding table.
    "You cannot do a kindness too soon because you never know how soon it will be too late." -Ralph Waldo Emerson
    Monday, November 30, 2009 5:01 PM

  • The only reason I haven't considered the other approach is it takes two maintenance windows.  One 2 hour window to do the transfer and another window to swap the tables, change the column to identity and create FK's.  I am also concerned about the overhead of the triggers on the source table.

    I do like your approach and I am going to do a test to gather data points of time needed to complete each.  It maybe worth to overhear.

    Oh, with my "other approach" I meant the simple DBCC CHECKIDENT approach.
    Maybe it can give you some time until a longer maintenance windows, such as a long holiday as 4th of July or something, where you can get longer windows (24+ hours) to complete a full switch.
    Monday, November 30, 2009 5:24 PM
  • Superb switchover plan Arnie!
    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Monday, November 30, 2009 8:40 PM
  • I just found a blog post on this topic http://blogs.lessthandot.com/index.php/DataMgmt/DBAdmin/MSSQLServerAdmin/what-to-do-when-your-identity-column-max


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
  • I also had the same issue this morning. As a quick fix we went with

    DBCC CHECKIDENT('MySchema.MyTableName', RESEED, -2147483648)
    For a long run, BIGINT looks promising because we get around 50 million records per day.
    

     

    Thursday, June 02, 2011 4:25 PM