none
Changin Collation - -q option?

    Question

  • Fellow SQLers.

    I have uncovered a SQL 2012 instance that was created with the wrong collation in dev environment. It is already in use.

    So, instance and system db's have the wrong collation, a user db has the same wrong collation, and some other db's have the desired collation. One of my fellow dba's uncovered an article about using the -q option with starting sqlserver. I have NEVER seen this and it appears to be undocumented. We did a test and it seem to work but of course I have no way of knowing if the data (table/columns) is correupted after it did a converson. The only way I can think of validating it is to compare tables/columns from orig db to post db conversion.

    I am use to the older longer way of rebuilding the master, etc etc.

    1 - Does anyone have any suggestins about using the -q option? I for one am leery of using it.

    https://www.mssqltips.com/sqlservertip/3519/changing-sql-server-collation-after-installation/

    Thx

    MG

    Wednesday, May 30, 2018 2:44 PM

Answers

  • Since this option is not well documented, I went ahead and documented it here:

    Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?

    It was too much info to post here, but I can say that there are definite advantages to using this undocumented "sqlservr -q" option (note: what is described below is just some highlights, but certainly not all of the relevant info contained in the post linked above):

    1. It bypasses several restrictions imposed by the documented method (check constraints and computed columns using the Database's Collation are ok, TVFs returning string columns using the DB Collation are fine, objects created WITH SCHEMABINDING that use the Database's Collation are fine, etc)
    2. It does not drop existing DBs, so no need to re-add DBs and other Instance-level objects, SQL Agent jobs, etc.
    3. It handles dropping and recreating indexes for you
    4. It uses less Tran Log space when old and new Collations use different Code Pages and there is non-Unicode data, since the data in the columns is left as is (i.e. not converted).

    The disadvantages appear to be:

    1. It is undocumented / unsupported
    2. For non-Unicode string types ( CHAR / VARCHAR / TEXT ) it can lead to data loss ( ONLY IF switching Code Pages!), differently than the documented method. The documented method can definitely lead to data loss for non-Unicode strings, but it does a Code Page conversion, so you would retain characters that exist in both Code Pages with different underlying values. A Code Page conversion also uses "Best Fit" mappings to convert to similar looking characters if any are defined for that particular conversion. This undocumented method merely changes the Collation within the meta-data, so characters are not converted if they exist in the new Code Page with a different underlying value. Instead, the "character" is whatever that value happens to be in the new Code Page.
      PLEASE NOTE: There is no potential for data loss if the old and new Code Pages are the same!
    3. Some data can be left in an invalid state:
      a) NTEXT columns will reject being set to a Collation that supports Supplementary Characters, if the new Collation supports them. NTEXT columns cannot use such Collations, so no new rows could be added to tables where this condition exists.
      b) Foreign Key rows can lose their reference to the PK if the reference relied upon an insensitivity (e.g. case-insensitive) and the new Collation is either binary or sensitive (in the same aspect that was insensitive before -- case-sensitive in this example).
    4. ALL string columns in ALL User DBs will be changed to the new Collation, even if you wanted to keep one or more at a specific Collation.

    Overall, if your non-Unicode data is all of the same Code Page, and you do not have and FK rows relying upon insensitivities and moving to a sensitive or binary Collation, then I would opt for this undocumented method and save the time and hassle of re-adding DBs and Instance-level objects, SQL Agent Jobs, etc.

    Hope that helps. Take care, Solomon..




    • Marked as answer by mg101 Tuesday, June 12, 2018 6:37 PM
    • Edited by Solomon Rutzky Thursday, June 14, 2018 5:29 PM Added clarification of Tran Log usage
    Tuesday, June 12, 2018 6:07 PM

All replies

  • I have never seen the -q option, and like you I'm not to keen on using something which is undocumented.

    I would guess that in the case of the user databases, it only changes the collation of the system tables, but leaves the user tables with the old collation. (And that is typically the most difficult thing to fix.)

    Wednesday, May 30, 2018 9:22 PM
  • I did it once but the instance was not in use. Worked just fine

    setup.exe /qb INSTANCENAME=MSSQLSERVER REINSTALL=SQL_Engine REBUILDDATABASE=1 SAPWD=SA_Password SQLCOLLATION=SQL_Latin1_General_CP1_CI_AS


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, May 31, 2018 7:29 AM
  • HI Erland

    Thanks for the input. A test was done before actually doing the "real' instance. This converted user dbs too.

    MG

    Thursday, May 31, 2018 10:29 AM
  • HI Uri,

    Thanks for this info. You are showing the proper resetting of collation. What I was asking about is the approach in the link in the question paragraph. it is using an option most of us have never seen. A -q with sqlserver.

    sqlservr -m -T4022 -T3659 -s"SQLEXP2014" -q"SQL_Latin1_General_CP1_CI_AI"

    MG



    • Edited by mg101 Thursday, May 31, 2018 11:01 AM rewording.
    Thursday, May 31, 2018 10:30 AM
  • Looks like undocumented feature why using in your case? 

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, May 31, 2018 11:03 AM
  • Hi Uri,

    That was my question. if anyone knows what "truly" the -q option does. What is the downside.  The approach in the LINK in the main paragraph of this posting shows an approach I have never seen.  It does not require scripting db's, reloading or anything.  I have always used the official MS approach on their website for changing server collation.

    I was just asking if anyone truly knows what -q option does:

    sqlservr -m -T4022 -T3659 -s"SQLEXP2014" -q"SQL_Latin1_General_CP1_CI_AI"

    Thanks,

    MG

    Thursday, May 31, 2018 11:34 AM
  • 2> Thanks for the input. A test was done before actually doing the "real' instance. This converted user dbs too.

    Yesh, the system tables. Did it really change the collation of all user tables? That is an immensly difficult operation, given how many things that could go wrong. (Values unique before is maybe not unique after the collation change, FK relations may blow up etc.)

    Thursday, May 31, 2018 9:31 PM
  • What is the downside.

    The downside is that if you later run into problems call support, they may ask you to restore from you last known good backup before the change. Undocumented features should be used in production only under the guidance of MS support.


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com


    Thursday, May 31, 2018 10:16 PM
  • HI Erland,

    Yes, it changed the collation AT THE COLUMN level on USER db's.

    MG

    Friday, June 1, 2018 7:32 PM
  • Thanks Dan. Yes, this was ONLY a Test DB on a VM. Any collation fixes I would do would follow the official MS recommended approach.

    MG

    Friday, June 1, 2018 7:33 PM
  • I could not resist testing this. I took a snapshot of a VM where I have an SQL Server instance.

    I found that it gives up on first error, so you may end up with a mix of the new collation and the old collation.

    An interesting question is how much the log explodes of a database of any size. I have two 1GB databases on these instances, but I was not able to apply the collation on these databases. To wit, both have filtered indexes. And when you work with filtered indexes (as well as XML indexes, spatial indexes, indexes on views and computed columns), the settings QUOTED_IDENTIFIER, ANSI_NULLS, ANSI_WARNINGS, ANSI_PADDING and CONCACT_NULL_YIELDS_NULL must all be ON. But since this executed by a system process, all are off. So the operation fails if such indexes are present...

    Saturday, June 2, 2018 9:54 PM
  • Perhaps easier and less painful  would be adding COLLATE database_default  hint?

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Sunday, June 3, 2018 6:57 AM
  • Perhaps easier and less painful  would be adding COLLATE database_default  hint?

    For changing the collation of an incorrectly installed server?

    Sunday, June 3, 2018 8:36 AM
  • HI Erland,

    Excellent info. So you are saying that it failed and did not rollback the entire operation at the instance level?

    MG

    Sunday, June 3, 2018 6:02 PM
  • Excellent info. So you are saying that it failed and did not rollback the entire operation at the instance level?

    Yep. And I am quite happy with that. A instance-wide transaction over all databases with terabytes of data is quite scary. What I was not able to safely conclude is how it works on database level. That is, is it a single transaction for all user tables, or could it stop and leave the database half-converted?

    I think that it would only be sensible to run this on an instance that runs on a VM and take a snapshot or backup the VM that you call roll back to in case of a failure. You don't really want to restore all database manually.

    Another thing that prevents the operation is if you have a database with read-only file, which includes offline databases and database snapshots. This is detected directly, and the collation change never starts.

    Sunday, June 3, 2018 7:32 PM
  • We had on client that did exactly same thing and it was about 40 places where we added the hint  and it worked just fine

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Monday, June 4, 2018 6:40 AM
  • Thanks Erland,

    MG

    Monday, June 4, 2018 2:55 PM
  • HI Uri,

    Maybe I missed something. What hint are you writing about? The test we are talking about is using the SQLServer.exe command in this link https://www.mssqltips.com/sqlservertip/3519/changing-sql-server-collation-after-installation/

    thx MG.


    • Edited by mg101 Monday, June 4, 2018 2:58 PM wrong name
    Monday, June 4, 2018 2:58 PM
  • I mean not using -q at all to rebuild collation as this is active prod server

    Alter query which failed with collation conflict with a hint d

    collate database_default such as

    select <cols> from t1 join #t1 on #t1.col=t1.col collate database_default 


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Monday, June 4, 2018 3:02 PM
  • Thanks for clarification Uri.

    MG

    Monday, June 4, 2018 3:26 PM
  • Since this option is not well documented, I went ahead and documented it here:

    Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?

    It was too much info to post here, but I can say that there are definite advantages to using this undocumented "sqlservr -q" option (note: what is described below is just some highlights, but certainly not all of the relevant info contained in the post linked above):

    1. It bypasses several restrictions imposed by the documented method (check constraints and computed columns using the Database's Collation are ok, TVFs returning string columns using the DB Collation are fine, objects created WITH SCHEMABINDING that use the Database's Collation are fine, etc)
    2. It does not drop existing DBs, so no need to re-add DBs and other Instance-level objects, SQL Agent jobs, etc.
    3. It handles dropping and recreating indexes for you
    4. It uses less Tran Log space when old and new Collations use different Code Pages and there is non-Unicode data, since the data in the columns is left as is (i.e. not converted).

    The disadvantages appear to be:

    1. It is undocumented / unsupported
    2. For non-Unicode string types ( CHAR / VARCHAR / TEXT ) it can lead to data loss ( ONLY IF switching Code Pages!), differently than the documented method. The documented method can definitely lead to data loss for non-Unicode strings, but it does a Code Page conversion, so you would retain characters that exist in both Code Pages with different underlying values. A Code Page conversion also uses "Best Fit" mappings to convert to similar looking characters if any are defined for that particular conversion. This undocumented method merely changes the Collation within the meta-data, so characters are not converted if they exist in the new Code Page with a different underlying value. Instead, the "character" is whatever that value happens to be in the new Code Page.
      PLEASE NOTE: There is no potential for data loss if the old and new Code Pages are the same!
    3. Some data can be left in an invalid state:
      a) NTEXT columns will reject being set to a Collation that supports Supplementary Characters, if the new Collation supports them. NTEXT columns cannot use such Collations, so no new rows could be added to tables where this condition exists.
      b) Foreign Key rows can lose their reference to the PK if the reference relied upon an insensitivity (e.g. case-insensitive) and the new Collation is either binary or sensitive (in the same aspect that was insensitive before -- case-sensitive in this example).
    4. ALL string columns in ALL User DBs will be changed to the new Collation, even if you wanted to keep one or more at a specific Collation.

    Overall, if your non-Unicode data is all of the same Code Page, and you do not have and FK rows relying upon insensitivities and moving to a sensitive or binary Collation, then I would opt for this undocumented method and save the time and hassle of re-adding DBs and Instance-level objects, SQL Agent Jobs, etc.

    Hope that helps. Take care, Solomon..




    • Marked as answer by mg101 Tuesday, June 12, 2018 6:37 PM
    • Edited by Solomon Rutzky Thursday, June 14, 2018 5:29 PM Added clarification of Tran Log usage
    Tuesday, June 12, 2018 6:07 PM
  • Thanks Solomon,

    Outstanding Research.

    MG

    Tuesday, June 12, 2018 6:37 PM
  • 3. It does not add too much to the Transaction Logs (because it does not do any actual string conversions)

    One would expect index rebuilds to produce quite an amount of logs.

    Tuesday, June 12, 2018 9:40 PM
  • 3. It does not add too much to the Transaction Logs (because it does not do any actual string conversions)

    One would expect index rebuilds to produce quite an amount of logs.

    Yes, I had expected it but didn't see much evidence of it. However, I had spent quite a bit of time testing and documenting and was a bit tired of it all. But, I went back and found that the DB was in SIMPLE recovery. I changed to FULL, took a backup, and re-ran the test a few times. At that point it did show activity with each iteration. I have updated my answer accordingly. Thanks for questioning this statement as I probably would not have looked more closely at this supposed behavior otherwise.
    Friday, June 22, 2018 5:11 PM