none
[Forum FAQ] How to fix the Error “The column XX cannot be processed because more than one code page (65001 and 1252) are specified for it” in SSIS?

    General discussion

  • Introduction

    When you create a SSIS package to load data from a UTF-8 encoding data source to a SQL Server table, and the SQL Server Collation is SQL_Latin1_General_CP1_CA_AS, you may encounter this error “The column XX cannot be processed because more than one code page (65001 and 1252) are specified for it”.

    Solution

    SQL Server uses USC-2 and UTF-16 encoding schemas (USC-2 is a subset of UTF-16) to store Unicode in nchar/nvarchar/ntext type columns. SQL Server uses code page to perform conversions between non-Unicode data and Unicode data, and code page is controlled by the SQL Server Collation. For example, a SQL Server instance whose collation is SQL_Latin1_General_CP1_CA_AS will use the 1252 (ANSI-Latin 1) code page, therefore, the non-Unicode columns created in this SQL Server databases natively use code page 1252.

    When you extract data from a UTF-8 encoding data source, SSIS automatically chooses the exact code page 65001(UTF-8). By default, a Data Flow Task (DFT) Source component keeps the non-Unicode nature of the source data as well as the code page, and outputs non-Unicode data by using [DT_STR] type output columns. If you attempt to load the non-Unicode data to a SQL Server table that uses code page 1252 to store non-Unicode data, there will be conflicts between the input columns (identified by code page 65001) and the destination columns (identified by code page 1252).

    Currently, it is not supported to convert non-Unicode data to Unicode data based on code page A, and then convert the Unicode to non-Unicode data based on code page B (A<>B) in a SSIS package. So, you have to convert the source data to Unicode data and store the data in Unicode columns in the SQL Server table. To do the data conversion, there are two approaches:

    • Open the Advanced Editor of the DFT Source Component, switch to the “Input and Output Properties” tab, and modify the data type of each Output columns from [DT_STR] to [DT_WSTR].
    • Add a Data Conversion transform to the package, and set the data type of the Output columns to [DT_WSTR].

    In addition, for the source data that uses a specific encoding schema that maps to a code page supported by SQL Server (for example, ANSI/OEM – Simplified Chinese GBK encoding schema maps to code page 936), you can alternatively create a database or columns with the Chinese_PRC_CS_AS collation, and directly load the source data to non-Unicode columns. There will be no code page conflicts between the source columns and the destination columns in this condition. Here are the examples to specify collation for a database and a column of a SQL Server table:

    CREATE DATABASE TestDB COLLATE Chinese_PRC_CS_AS;
     
    USE TestDB;
     
    CREATE TABLE dbo.MyTable
       (ProductID   int PRIMARY KEY,
       SalesTerritory  varchar(10) COLLATE Chinese_PRC_CS_AS NOT NULL
     
     );
     GO

    More Information

    Code Page Architecture
    http://technet.microsoft.com/en-us/library/ms186356(v=sql.105).aspx

    UTF-16 Encoding and SQL Server
    http://blogs.msdn.com/b/qingsongyao/archive/2009/02/01/utf-16-encoding-and-sql-server-1.aspx

    SQL Server and UTF-8 Encoding (1) -True or False
    http://blogs.msdn.com/b/qingsongyao/archive/2009/04/10/sql-server-and-utf-8-encoding-1-true-or-false.aspx

    Applies to

    Microsoft SQL Server 2005
    Microsoft SQL Server 2008
    Microsoft SQL Server 2008 R2
    Microsoft SQL Server 2012

    Monday, January 13, 2014 5:57 AM