UTF-8 to Latin-1
-
Tuesday, March 06, 2012 8:05 AM
I need to support a legacy situation, I have UTF-8 OCR text files that need to be converted to Latin-1 because an older program was written that only handles Latin-1 files.
I know this sounds silly but it does not matter if garbage is created for non-convertable or misconverted charaters, in fact that was one of the reasons for the orginal program. A human gets paid to inspect and correct the files afterwords. And, no, the OCR files cannot be saved as Latin-1 because they need them in UTF-8 also to support other than english text. I am not that familiar with encoding and code pages so there may be an obvious answer I am missing but all I see is the ability to convert to ANSI and I don't think I want a 7-bit coding. How do I write the code to convert this?
David Edwards
- Edited by De2164 Tuesday, March 06, 2012 8:06 AM
All Replies
-
Tuesday, March 06, 2012 11:55 AM
Hi, have you looked at the Encoding class found at http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx ? Using the GetEncoding (AFAIK "latin-1" is "iso-8859-1" ?) and Convert methods should allow to translate from one encoding to another...
Please always mark whatever response solved your issue so that the thread is properly marked as "Answered".
- Proposed As Answer by Armin Zingler Tuesday, March 06, 2012 12:50 PM
- Marked As Answer by Shanks ZenMicrosoft Contingent Staff, Moderator Thursday, March 15, 2012 3:11 AM
-
Tuesday, March 06, 2012 1:15 PM
Perhaps I am just being stone stupid but I still do not understand, but then I never am very bright first thing in the morning. Particularly when I have been up half the night puzzling over this program.
There is a glimmer of possible understanding though so perhaps I will get it yet before too long. Of course the biggest question for me is, why have they not just saved the file twice, once as latin-1 and again as UTF-8. LOL, It is not for me to question them "Why?" Just do the program as asked and move on.
David Edwards

