none
How to convert xml utf-16 to utf-8?

    Question

  • Hi!

     

    I have the following problem and need urgent solution:

    - I have a tabel what I export with bcp to xml file with following command:

     

    bcp "select   * from mydb.dbo.myview FOR XML PATH ('Product'), ROOT ('Products'), ELEMENTS" queryout c:\myfolder\myxml.xml -T -r -t -w

     

    The batch is running in order, but the file what I get is encoded in UTF-16. How can I export to be in UTF-8.

    Thx for any help.

     

     

    Tuesday, November 25, 2008 7:30 PM

Answers

  • Finally I fond the solution. Not that elegant as I wanted, but works.

    I read a lot of forums, and documents, but nor of them I could use it. I wanted to solve the problem without changing the encoding in database, I wanted to have the export file in utf-8. One solution was to usethe "save as" in notepad, but I neede an automatic solution.

    After a lot of search I found a small application what I can use from command prompt: charsc.exe developed by Benjamin Kalytta. Great tool, thx Benjamin. www.kalytta.com/tools.php .

    So here is the solution in one batch:

     

    bcp "select  * from mydb.dbo.myview FOR XML PATH ('Product'), ROOT ('Products'), ELEMENTS" queryout c:\myexportdir\myexport.xml  -T -r -t -C utf-8 -w
    c:\myexportdir\charsc c:\myexportdir\myexport.xml /scs=utf-16 /dcs=utf-8 /replace

     

     

    Sunday, December 07, 2008 5:14 PM

All replies

  • Try this:

     

    bcp "declare @x xml; select @x=(select * from mydb.dbo.myview FOR XML PATH ('Product'), ROOT ('Products'), ELEMENTS,TYPE);SELECT @x;" queryout c:\myfolder\myxml.xml -T -r -t -c

    Tuesday, November 25, 2008 8:58 PM
  • Thax Phe,

    It looks that works. How can be sure that I got utf-8 encoding. I can see that the file size is almost the half, and some charcters apear in incorrect way in browser or notepad.

    Can you explain me what is changed if I use this method with declare?

    THX

     

    Tuesday, November 25, 2008 9:36 PM
  • utf-8 uses one or more bytes (8 bits) integer to reprents a character. utf-16, which is always used by SQL Server xml, uses two bytes for all characters.

     

    The -c option in bcp export characters by single byte. If the data in your table use ASCII only, the method should work. If you have wide characters like Chinese, though SQL Server exports two bytes for one character, I am not sure it is in the same way or not as xml utf-8 encoding. So please peform more test.

     

    Tuesday, November 25, 2008 10:13 PM
  • Thanks,

     

    By the way I have special characters like éáűúőóüöí in database, and these were not converted in the right way. Sad

     

    Wednesday, November 26, 2008 6:13 AM
  • So you probably need utf-16 then?

     

    What a brilliant answer from phe!

     

    Wednesday, November 26, 2008 9:18 PM
  • Hi,

    no, I need utf-8 at the end. The data is in SQL 2005 and is stored in UTF-16, but my recipient doesn't accept utf-16, only utf-8.

    It looks I solved the problem, just I'm not sure that the result is utf-8.

    Here are the solutions what I found:

    1. Temporary, and interactive: Opened the file with notepad and saved as utf-8.

    2. Using the bcp with -C UTF-8 switch, as it is described in: http://msdn.microsoft.com/en-us/library/ms190657.aspx

     

    Now, how can I be sure that the file I recieved at the end is in UTF-8, because it looks exactly as UTF-16 (what is good, becasue my special characters are represented well)?

    When I export the file wit bcp -C UTF-8 the size is 928KB, opening the same file with notepad and using save as UTF-8 I get a file with size 474KB.

    ould it happen that the bcp -C switch is not converting in correct way?

     

    Thursday, November 27, 2008 6:25 AM
  • Isn't it as simple as converting your xml into varchar(max)? You just need the correct code pages installed in your sql server, but since you already have these characters in there, I'm assuming that isn't going to be a problem.

     

    Code Snippet

    declare @xml xml

    declare @nvarchar nvarchar(max)

    declare @varchar varchar(max)

     

    select @xml = '

    <xml>

    <element>x</element>

    </xml>

    '

    select @nvarchar = convert( nvarchar(max), @xml)

    select @varchar = convert( varchar(max), @xml)

    select datalength(@nvarchar), datalength(@varchar)

     

     

     

    The result is:

    -------------------- --------------------

    62 31

    (1 row(s) affected)

     

    Thursday, November 27, 2008 8:02 AM
  • Finally I fond the solution. Not that elegant as I wanted, but works.

    I read a lot of forums, and documents, but nor of them I could use it. I wanted to solve the problem without changing the encoding in database, I wanted to have the export file in utf-8. One solution was to usethe "save as" in notepad, but I neede an automatic solution.

    After a lot of search I found a small application what I can use from command prompt: charsc.exe developed by Benjamin Kalytta. Great tool, thx Benjamin. www.kalytta.com/tools.php .

    So here is the solution in one batch:

     

    bcp "select  * from mydb.dbo.myview FOR XML PATH ('Product'), ROOT ('Products'), ELEMENTS" queryout c:\myexportdir\myexport.xml  -T -r -t -C utf-8 -w
    c:\myexportdir\charsc c:\myexportdir\myexport.xml /scs=utf-16 /dcs=utf-8 /replace

     

     

    Sunday, December 07, 2008 5:14 PM
  • I found a solution to the same problem but without needing the convert utility:

    bcp "your query here" queryout c:\myoutfile.xml  -T -c -C1252


    The key is the use of -c with -C. Do not use the -w parameter, that will invalidate the code page 1252.  Also note that the output may not display correctly in Internet Explorer even if it is valid and well formed xml.  With mine IE reported invalid characters, but when used for my ETL process it worked ok.

    Hope this helps someone, I spent days working this one out.  I had the -w parameter because I was working with umlaut characters and could not work out why bcp always dumped the file in UTF-16. Removing -w made the code page option work properly!

    Sunday, May 09, 2010 10:21 PM
  • Thanks!!!!!!  This is a great tool and exactly what I was looking for.
    Wednesday, September 15, 2010 9:04 PM