locked
Convert Ascii text file to UTF8

    Question

  • I get a pipe delimited text file from an outside data source. It appears to be encoded as ascii. When I open the file in notepad, the extended ascii characters are visible and displayed correctly. However, when I import this into SQL Server, the extended ascii characters are now a ?. If I go into notepad and save as and change the encoding to UTF-8, then when I import, the extended characters are stored correctly. The fields in SQL Server are nvarchar. My code uses a StreamReader to read each line/record individually and then imports that record into the database table.

    How can I get this data into the database without manually opening the file and doing a save as...

    Thanks,

    Andy

    Tuesday, December 21, 2010 3:50 PM

Answers

  • private
    
     string
    
     ConvertAsciiToUTF8(string
    
     inAsciiString)
    {
     // Create encoding ASCII.
    
    
     Encoding inAsciiEncoding = Encoding.ASCII;
     // Create encoding UTF8.
    
    
     Encoding outUTF8Encoding = Encoding.UTF8;
    
     // Convert the input string into a byte[].
    
    
     byte
    
    [] inAsciiBytes = inAsciiEncoding.GetBytes(inAsciiString);
     // Conversion string in ASCII encoding to UTF8 encoding byte array.
    
    
     byte
    
    [] outUTF8Bytes = Encoding.Convert(inAsciiEncoding, outUTF8Encoding, inAsciiBytes);
     
     // Convert the byte array into a char[] and then into a string.
    
    
     char
    
    [] inUTF8Chars = new
    
     char
    
    [outUTF8Encoding.GetCharCount(outUTF8Bytes, 0, outUTF8Bytes.Length)];
     outUTF8Encoding.GetChars(outUTF8Bytes, 0, outUTF8Bytes.Length, inUTF8Chars, 0);
     
     string
    
     outUTF8String = new
    
     string
    
    (inUTF8Chars);
     return
    
     outUTF8String;
    }
    
    

    Please Mark it as answer if helps you.


    Santosh.
    Wednesday, December 22, 2010 6:40 AM

All replies

  • FileStream test = new FileStream("YourFile",FileMode.Open,FileAccess.Read);
    
          byte[] data = new byte[test.Length];
          test.Read(data, 0, (int) test.Length);
          Encoding.Convert(Encoding.ASCII, Encoding.UTF8, data);
          //Do stuff
    

    You can take the file in as a stream and then save it with a different encoding.

     


    Devlin Liles http://twitter.com/devlinliles http://www.devlinliles.com/ If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".
    Tuesday, December 21, 2010 4:14 PM
  • Will this also work if I take just a line of text at a time?

    The text files are quite large in the 400-500 MB range.

     

    Tuesday, December 21, 2010 6:55 PM
  • Yeah you can convert the encoding on the line.

    Devlin Liles http://twitter.com/devlinliles http://www.devlinliles.com/ If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".
    Tuesday, December 21, 2010 7:00 PM
  • Sure it will. But in case if you have so large files, would be good to use the upper code in the a new thread. Create new thread and put some progress bar in the code, that user can see whats going on. The point of starting file stream in a new thread is, that the app (of any kind) will be responsive.

    Hope it helps,

    Mitja

    Tuesday, December 21, 2010 7:00 PM
  • private
    
     string
    
     ConvertAsciiToUTF8(string
    
     inAsciiString)
    {
     // Create encoding ASCII.
    
    
     Encoding inAsciiEncoding = Encoding.ASCII;
     // Create encoding UTF8.
    
    
     Encoding outUTF8Encoding = Encoding.UTF8;
    
     // Convert the input string into a byte[].
    
    
     byte
    
    [] inAsciiBytes = inAsciiEncoding.GetBytes(inAsciiString);
     // Conversion string in ASCII encoding to UTF8 encoding byte array.
    
    
     byte
    
    [] outUTF8Bytes = Encoding.Convert(inAsciiEncoding, outUTF8Encoding, inAsciiBytes);
     
     // Convert the byte array into a char[] and then into a string.
    
    
     char
    
    [] inUTF8Chars = new
    
     char
    
    [outUTF8Encoding.GetCharCount(outUTF8Bytes, 0, outUTF8Bytes.Length)];
     outUTF8Encoding.GetChars(outUTF8Bytes, 0, outUTF8Bytes.Length, inUTF8Chars, 0);
     
     string
    
     outUTF8String = new
    
     string
    
    (inUTF8Chars);
     return
    
     outUTF8String;
    }
    
    

    Please Mark it as answer if helps you.


    Santosh.
    Wednesday, December 22, 2010 6:40 AM
  • It's even easier to just specify the Encoding in your StreamReader constructor:

    http://msdn.microsoft.com/en-us/library/aa328960(v=vs.71).aspx

    That way you don't need to do the encoding/decoding yourself. Once the string has been read in .NET and is an actual string variable, the values have been neatly converted to Unicode.

    Wednesday, December 22, 2010 3:39 PM
  •  

    Hi  AndySchliewe,

     

    Has this question solved?

     

    Have you tried the suggestions?

     

    If you have a so large txt file need to read and convert, I think maybe you can read and write side by side. I mean you can read one record, convert this record and then store it into a new file or insert it into database directly. And for the convert encoding code in C#, you just can reference the above posts.

     

    I think when we need to read a large file, we should better not store too much data in the memory.

     

    If there's any concern, please feel free to let me know.

     

    Have a nice day!


    Mike [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Friday, December 24, 2010 4:57 AM
  •  

    Hi  AndySchliewe,

     

    I want to make sure if you has tried the suggestions.

     

    Has this question been solved?

     

    If there's any concern, please feel free to let me know.

     

    Have a nice day!


    Mike [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Tuesday, December 28, 2010 8:43 AM
  • Hi  AndySchliewe,

     

    Now, I close this thread with mark the replies. We have not seen your response so we cannot know if your issue has been solved by these two replies. But I think the replies are helpful on this topic, and it can also help the other communities to solve their issue when they see this thread. Hope you can see this. And thanks for your understanding.

     

    If there's any concern, please feel free to let me know.

     

    Best wishes,


    Mike [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Wednesday, December 29, 2010 3:58 PM