terça-feira, 6 de março de 2012 08:05
I need to support a legacy situation, I have UTF-8 OCR text files that need to be converted to Latin-1 because an older program was written that only handles Latin-1 files.
I know this sounds silly but it does not matter if garbage is created for non-convertable or misconverted charaters, in fact that was one of the reasons for the orginal program. A human gets paid to inspect and correct the files afterwords. And, no, the OCR files cannot be saved as Latin-1 because they need them in UTF-8 also to support other than english text. I am not that familiar with encoding and code pages so there may be an obvious answer I am missing but all I see is the ability to convert to ANSI and I don't think I want a 7-bit coding. How do I write the code to convert this?
- Editado De2164 terça-feira, 6 de março de 2012 08:06
Todas as Respostas
terça-feira, 6 de março de 2012 11:55
Hi, have you looked at the Encoding class found at http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx ? Using the GetEncoding (AFAIK "latin-1" is "iso-8859-1" ?) and Convert methods should allow to translate from one encoding to another...
Please always mark whatever response solved your issue so that the thread is properly marked as "Answered".
terça-feira, 6 de março de 2012 13:15
Perhaps I am just being stone stupid but I still do not understand, but then I never am very bright first thing in the morning. Particularly when I have been up half the night puzzling over this program.
There is a glimmer of possible understanding though so perhaps I will get it yet before too long. Of course the biggest question for me is, why have they not just saved the file twice, once as latin-1 and again as UTF-8. LOL, It is not for me to question them "Why?" Just do the program as asked and move on.