Locked UTF-8 to Latin-1

  • Tuesday, March 06, 2012 8:05 AM
     
     

    I need to support a legacy situation, I have UTF-8 OCR text files that need to be converted to Latin-1 because an older program was written that only handles Latin-1 files. 

    I know this sounds silly but it does not matter if garbage is created for non-convertable or misconverted charaters, in fact that was one of the reasons for the orginal program.  A human gets paid to inspect and correct the files afterwords.  And, no, the OCR files cannot be saved as Latin-1 because they need them in UTF-8 also to support other than english text.  I am not that familiar with encoding and code pages so there may be an obvious answer I am missing but all I see is the ability to convert to ANSI and I don't think I want a 7-bit coding.  How do I write the code to convert this?


    David Edwards


    • Edited by De2164 Tuesday, March 06, 2012 8:06 AM
    •  

All Replies

  • Tuesday, March 06, 2012 11:55 AM
     
     Answered

    Hi, have you looked at the Encoding class found at http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx ? Using the GetEncoding (AFAIK "latin-1" is "iso-8859-1" ?) and Convert methods should allow to translate from one encoding to another...


    Please always mark whatever response solved your issue so that the thread is properly marked as "Answered".

  • Tuesday, March 06, 2012 1:15 PM
     
     

    Perhaps I am just being stone stupid but I still do not understand, but then I never am very bright first thing in the morning.  Particularly when I have been up half the night puzzling over this program. 

    There is a glimmer of possible understanding though so perhaps I will get it yet before too long.  Of course the biggest question for me is, why have they not just saved the file twice, once as latin-1 and again as UTF-8.  LOL, It is not for me to question them "Why?"  Just do the program as asked and move on.


    David Edwards