I want to test / evaluate encoding converters. To do that, I need a lot of strings whose encoding conversion is already known.
For example, I'd like to have about a hundred UTF-16 text strings, perhaps 10 per common language (Chinese traditional, Chinese simplified, Japanese, Russian, Dutch, etc) and I'd like for those strings to incorporate the "odd cases" such as "combining characters".
Then I'd like to have "repeats" of all those strings, except now in UTF-8. Furthermore, I'd like even more "repeats" of those strings to represent each language in its own Windows "code page."
Once I have all those strings, I can use an encoding conversion tool to compare the conversion to the expected value.
Where do I go to find such I18n "string resources" for testing purposes?
-Brent Arias