When you create a SSIS package to load data from a UTF-8 encoding data source to a SQL Server table, and the SQL Server Collation
is SQL_Latin1_General_CP1_CA_AS, you may encounter this error “The column XX cannot be processed because more than one code page (65001 and 1252) are specified for it”.
SQL Server uses USC-2 and UTF-16 encoding schemas (USC-2 is a subset of UTF-16) to store Unicode in nchar/nvarchar/ntext type columns. SQL Server
uses code page to perform conversions between non-Unicode data and Unicode data, and
code page is controlled by the SQL Server Collation. For example, a SQL Server instance whose collation is SQL_Latin1_General_CP1_CA_AS will use the 1252 (ANSI-Latin 1) code page, therefore, the non-Unicode columns created in this
SQL Server databases natively use code page 1252.
When you extract data from a UTF-8 encoding data source, SSIS automatically chooses the exact code page 65001(UTF-8). By default,
a Data Flow Task (DFT) Source component keeps the non-Unicode nature of the source data as well as the code page, and outputs non-Unicode data by using [DT_STR] type output columns. If you attempt to load the non-Unicode data to a SQL Server table that uses
code page 1252 to store non-Unicode data, there will be conflicts between the input columns (identified by code page 65001) and the destination columns (identified by code page 1252).
Currently, it is not supported to convert non-Unicode data to Unicode data based on code page A, and then convert the Unicode to
non-Unicode data based on code page B (A<>B) in a SSIS package. So, you have to convert the source data to Unicode data and store the data in Unicode columns in the SQL Server table. To do the data conversion, there are two approaches:
Open the Advanced Editor of the DFT Source Component, switch to the “Input and Output Properties” tab, and modify the data type of each Output columns from [DT_STR] to [DT_WSTR].
Add a Data Conversion transform to the package, and set the data type of the Output columns to [DT_WSTR].
In addition, for the source data that uses a specific encoding schema that maps to a code page supported by SQL Server (for example,
ANSI/OEM – Simplified Chinese GBK encoding schema maps to code page 936), you can alternatively create a database or columns with the Chinese_PRC_CS_AS collation, and directly load the source data to non-Unicode columns. There will be no code page conflicts
between the source columns and the destination columns in this condition. Here are the examples to specify collation for a database and a column of a SQL Server table:
CREATE DATABASE TestDB COLLATE Chinese_PRC_CS_AS;
CREATE TABLE dbo.MyTable
(ProductID int PRIMARY KEY,
SalesTerritory varchar(10) COLLATE Chinese_PRC_CS_AS NOT NULL