locked
Access Blobs importing as Unicode RRS feed

  • Question

  • Hi

    I have an old Access application that I am converting to SQL Server/DotNet.

     

    One table in Access has an ‘OLE Object’ field containing BLOBs of imported files (using code from here: http://support.microsoft.com/kb/103257/en-gb )

     

    When I import the table to SQL Server using:

     

    Select * into myTable

    From OpenRowSet('Microsoft.Jet.OLEDB.4.0','C:\DataFile.MDB';'admin';'', BlobData)

     

    the BLOBs are encoded in Unicode (at least I think that’s what it is, please correct me if i'm wrong).

     

    My question is: how can I prevent the conversion to Unicode or convert it back from Unicode into the original sequence of bytes?

     

    Thanks in advance.

    Paul.
    Friday, May 28, 2010 11:39 AM

Answers

  • SQL Server used three different data types for BLOB storage:

     Text - The Text data type can accommodate up to 2GB of non-Unicode text data.
     
     nText - The nText data type can accommodate up to 1GB of Unicode text data.
     
     Image - The Image data type can store up to 2GB of binary data, which also enables it to store standard text data. 

    You no need to bouther it will stored in unicode. when you retrive it will show you normally.


    accessing blob objects see the below article
    http://www.akadia.com/services/dotnet_load_blob.html

    If you what to do encode and decode below methods will help you

    decoding unicode data

    1.CREATE FUNCTION DecodeUnicodeData(@DATA NVARCHAR(4000), @PREFIX VARCHAR(100), @Suffix VARCHAR(100))
    2.RETURNS NVARCHAR(4000)
    3.AS
    4.BEGIN
    5.  DECLARE @START INT
    6.  DECLARE @END INT
    7.
    8.  WHILE CHARINDEX(@PREFIX, @DATA) > 0
    9.    BEGIN
    10.      SET @START = CHARINDEX(@PREFIX, @DATA)
    11.      SET @END = CHARINDEX(@Suffix, @DATA, @START)
    12.
    13.      SET @DATA = REPLACE(@DATA, SUBSTRING(@DATA, @START, @END -@START + LEN(@Suffix)),NCHAR(SUBSTRING(@DATA, @START+ LEN(@PREFIX),@END -@START - LEN(@PREFIX))))
    14.    END
    15. 
    16.  RETURN @DATA
    17.
    18.END


    SELECT dbo.DecodeUnicodeData('Ωmega', '&#', ';')


    encoding unicode data


    1.CREATE FUNCTION EncodeUnicodeData(@DATA NVARCHAR(4000), @PREFIX VARCHAR(20), @Suffix VARCHAR(20))
    2.RETURNS VARCHAR(8000)
    3.AS
    4.BEGIN
    5.  DECLARE @i INT
    6.  DECLARE @OUTPUT VARCHAR(8000)
    7.
    8.  SET @i = 1
    9.  SET @OUTPUT = ''
    10.
    11.  WHILE @i <= LEN(@DATA)   
    12.    BEGIN
    13.      IF UNICODE(SUBSTRING(@DATA, @i, 1)) > 255
    14.        SET @OUTPUT = @OUTPUT + @PREFIX + CONVERT(VARCHAR(5),UNICODE(SUBSTRING(@DATA, @i, 1))) + @Suffix
    15.      ELSE
    16.        SET @OUTPUT = @OUTPUT + SUBSTRING(@DATA, @i, 1)
    17.
    18.      SET @i = @i + 1
    19.    END
    20.
    21.  RETURN @OUTPUT
    22.END


    SELECT dbo.EncodeUnicodeData(N'&#937;mega', '&#', ';')

    Friday, May 28, 2010 12:56 PM

All replies

  • SQL Server used three different data types for BLOB storage:

     Text - The Text data type can accommodate up to 2GB of non-Unicode text data.
     
     nText - The nText data type can accommodate up to 1GB of Unicode text data.
     
     Image - The Image data type can store up to 2GB of binary data, which also enables it to store standard text data. 

    You no need to bouther it will stored in unicode. when you retrive it will show you normally.


    accessing blob objects see the below article
    http://www.akadia.com/services/dotnet_load_blob.html

    If you what to do encode and decode below methods will help you

    decoding unicode data

    1.CREATE FUNCTION DecodeUnicodeData(@DATA NVARCHAR(4000), @PREFIX VARCHAR(100), @Suffix VARCHAR(100))
    2.RETURNS NVARCHAR(4000)
    3.AS
    4.BEGIN
    5.  DECLARE @START INT
    6.  DECLARE @END INT
    7.
    8.  WHILE CHARINDEX(@PREFIX, @DATA) > 0
    9.    BEGIN
    10.      SET @START = CHARINDEX(@PREFIX, @DATA)
    11.      SET @END = CHARINDEX(@Suffix, @DATA, @START)
    12.
    13.      SET @DATA = REPLACE(@DATA, SUBSTRING(@DATA, @START, @END -@START + LEN(@Suffix)),NCHAR(SUBSTRING(@DATA, @START+ LEN(@PREFIX),@END -@START - LEN(@PREFIX))))
    14.    END
    15. 
    16.  RETURN @DATA
    17.
    18.END


    SELECT dbo.DecodeUnicodeData('&#937;mega', '&#', ';')


    encoding unicode data


    1.CREATE FUNCTION EncodeUnicodeData(@DATA NVARCHAR(4000), @PREFIX VARCHAR(20), @Suffix VARCHAR(20))
    2.RETURNS VARCHAR(8000)
    3.AS
    4.BEGIN
    5.  DECLARE @i INT
    6.  DECLARE @OUTPUT VARCHAR(8000)
    7.
    8.  SET @i = 1
    9.  SET @OUTPUT = ''
    10.
    11.  WHILE @i <= LEN(@DATA)   
    12.    BEGIN
    13.      IF UNICODE(SUBSTRING(@DATA, @i, 1)) > 255
    14.        SET @OUTPUT = @OUTPUT + @PREFIX + CONVERT(VARCHAR(5),UNICODE(SUBSTRING(@DATA, @i, 1))) + @Suffix
    15.      ELSE
    16.        SET @OUTPUT = @OUTPUT + SUBSTRING(@DATA, @i, 1)
    17.
    18.      SET @i = @i + 1
    19.    END
    20.
    21.  RETURN @OUTPUT
    22.END


    SELECT dbo.EncodeUnicodeData(N'&#937;mega', '&#', ';')

    Friday, May 28, 2010 12:56 PM
  • My apologies

    After further research it turns out the data isn’t being converted to Unicode it seems to be a codepage conversion.

    Each byte is being converted to its corresponding 2-byte number (this is what made me think it was Unicode) with bytes between 0x80 and 0x9F having a more complex conversion.

    Thus

    0x01 becomes 0x0001

    0x02 becomes 0x0002

    but

    0x80 becomes 0xAC20

    0x9F becomes 0x7801

     

    Knowing this I can write a function to convert each 2-byte code back into its original single byte code.

     

    However it would be worth knowing if is possible to turn off this code page conversion and to force SQL Server to just read in the original data, or if there is an easier way to convert binary data between code pages?

     

    If not, then please consider this question answered.

     

    Cheers

    Paul.
    Friday, May 28, 2010 3:38 PM
  • This will help you to resolve your issue, try with this

    Wednesday, June 2, 2010 12:35 PM