User-1677473226 posted
I have an ASP.NET app that accepts an XML document via HTTP using Request.InputStream. The XML doc is UTF-8 encoded (<?xml version="1.0" encoding="UTF-8"?>) and contains some special unicode characters (i.e. non-English letters
which contain accent marks, etc.). I am reading the Request.InputStream into a DataSet using ReadXml().
Stream xmlFile = Request.InputStream;
DataSet ds = new DataSet();
ds.ReadXml(xmlFile);
The stream is accepted fine, but once it tries to read it into the DataSet, I get an error: "There is an invalid character in the given encoding..." If the original XML file's encoding is changed to <?xml version="1.0" encoding="ISO-8859-1"?>
then it works without error. This is confusing to begin with because I thought that the UTF-8 charset covered all the characters in ISO-8859-1. Is that incorrect?
Since I don't have any control over how the original XML doc is encoded (it must remain UTF-8), I have been trying to find a way to set the encoding at runtime. I have tried setting Request.ContentEncoding to both UTF-8 and ISO-8859-1, both in the Page_Load()
and in the Application_BeginRequest() handlers. Also tried setting the <globalization> settings in Web.Config both ways - nothing has worked. I also tried reading the XML file into an XmlDocument object instead of a DataSet, but got the same error.
I'd like to keep it in a DataSet if at all possible because it's much easier for the data manipulation that I have to do later on in the program. Is there any way to avoid the error without changing the original InputStream?
Any help would be greatly appreciated. Thanks.