XMLSerializer \r\n problem
Hi,
I am using XmlSerializer to serialize an object to and from a database field. The target table has a text type field to hold the serialized object information (as an XML blob). I am using SQL Server 2000 and .Net 2.0. Most fields of the object serialize and deserialize fine except for strings. I have a string field storing some data that seems to come back slighty different. After debugging my code I have come to the conclusion that the XMLSerializer correctly writes strings to the database with \r\n intact but when deserializing, it returns strings with only \n. I have attached the code below.
XmlSerializer xsz = new XmlSerializer(typeof(ComparisonOptions)); Options copt = null; using (SqlConnection con = new SqlConnection(overlordDB)){
using (SqlCommand cmd = new SqlCommand("usp_GetRecord", con)){
CommandType.StoredProcedure;cmd.CommandType =
cmd.Parameters.AddWithValue(
"@id", id);con.Open();
using (SqlDataReader rdr = cmd.ExecuteReader()){
Debug.Assert(rdr.RecordsAffected != 1); //get the first recrod, we only expect one.rdr.Read();
StringReader str = new StringReader((string)rdr["Options"]); --This line seems to read the string finecopt = (
ComparisonOptions)xsz.Deserialize(str); --After this copt string fields are missing \r.copt.ID = (
int)rdr["ID"];copt.Name = (
string)rdr["Name"];copt.Description = (
string)rdr["Description"];}
}
}
The Options object is being populated from the corresponding database text field. this object has a string field which is assigned via deserialization. I have run through the code with the debugger and at the point I have marked (StringReader str decleration and assignment) the string fields in the XML blob are all read in correctly from the database and have their \r\n intact. After deserializing in to copt, the string fields of copt have only \n.
There doesent seem to be anything I can find in the forums or on google that describes a similar sort of problem.
Any help would be greatly appreciated as I have hit a dead end with this.
Imran
Answers
From what I can tell all the other non-abstract XmlReader implementations use the XmlTextReader under the hood. They just put more layers of business logic on top of it.
None of the others seem to directly parse the text, but rather work with the abstracted objects produced by XmlTextReaderImpl's parsing stage.
Of course, I could be wrong about that... There's a lot of code to look through.
Also interesting to note that if you create the XmlReader via XmlReader.Create, and pass a XmlReaderSettings object, that the XmlReaderSettings object only allows you to control it's handling of "insignificant" whitespace via the IgnoreWhitespace property.. It doesn't have a property for controlling normalization, as the XmlTextReader does.
Also interesting is that all the concrete implementations use the wrapper pattern, instead of an inheritance pattern, so even though they are all XmlReaders, and they all use a XmlTextReader under-the-hood, they use it by keeping XmlTextReader instance in a private field, rather than inheriting from it. I'm not sure what the motivation was behind that, but it means that you can't downcast the other concrete implementations to XmlTextReader, only to XmlReader... which is more limited.
Anyhow, for the OP's original example.. you probably want code that looks like this:
Code SnippetXmlSerializer
xsz = new XmlSerializer(typeof(ComparisonOptions));Options
copt = null;using
(SqlConnection con = new SqlConnection(overlordDB))using
(SqlCommand cmd = new SqlCommand("usp_GetRecord", con)){
"@id", id);cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.AddWithValue(
con.Open();
using (SqlDataReader rdr = cmd.ExecuteReader()){
Debug.Assert(rdr.RecordsAffected != 1);//get the first recrod, we only expect one.
rdr.Read();
using (StringReader str = new StringReader(Convert.ToString(rdr["Options"])))
using (XmlTextReader xtr = new XmlTextReader(str))
{
xtr.WhitespaceHandling = WhitespaceHandling.Significant;
copt = (ComparisonOptions)xsz.Deserialize(xtr);
copt.ID = Convert.ToInt32(rdr["ID"]);
copt.Name = Convert.ToString(rdr["Name"]);
copt.Description = Convert.ToString(rdr["Description"]);
}
}
}
Hope that helps,
Troy
All Replies
- Hello,
Are you using XmlWriter in your serialize function, if so I think you can use the XmlWriterSettings to persist your indents.
Thanks
Netwhiz To help isolate the problem, remove the database from the picture in a small test harness. Simply serialize the data to an xml string, then immediately deserialize it to an object again. See if you're missing the '\r' in the final object.
If so, the problem is within the XmlSerializer settings. If not, then problem is related to database storage and retrieval.
I regularly serialize and deserialize objects with string data the contains CR/LF, and I've never run into issue like that, and I don't use any special XmlSerializer configuration.. Just out of the box. I imagine it has something to do with the insert to the database, converting the line endings.
What datatype is the DB field where this is being stored? Are you using a stored proc through a SqlCommand object for the insert as well, or using an dynamic query? If it's a stored proc, does the parameter of the stored proc for passing the xml data match the DB field type or are you using on the sql convert functions to get it to match?
What is the collation setting for the database?
Hope that helps,
Troy
Hi Guys,
Nope, I am not using XmlWriter I am using StringWriter. From what I can see the problem doesent seem to be perisisng the objects to the database as the \r\n control chars are intact at this point and it is demonstrated in the below test when you step through witht he debugger. The test below demonstrates my problem.
using
System;using
System.IO;using
System.Xml.Serialization;using
System.Xml;namespace
TestXmlSerializecrlf{
internal class Program{
static readonly string orig = @"multiline inputanother line here
one more line"
; private static void Main(string[] args){
serialize();
deserialize();
Console.ReadLine();}
private static void serialize(){
//Serialize XmlSerializer xsz = new XmlSerializer(typeof (TestObj)); StringWriter sWriter = new StringWriter(); TestObj to = new TestObj();to.txt = orig;
xsz.Serialize(sWriter, to);
//write using (StreamWriter sw = new StreamWriter(@"C:\TestFile.txt", false)){
sw.Write(sWriter.ToString());
}
}
private static void deserialize(){
//Deserialize XmlSerializer xsz = new XmlSerializer(typeof (TestObj)); TestObj to = null; using (StreamReader sr = new StreamReader(@"C:\TestFile.txt")){
StringReader str = new StringReader(sr.ReadToEnd()); --Here the string is intactto = (
TestObj) xsz.Deserialize(str); --Here it isnt}
if (to.txt != orig){
Console.Write("Strings are not equal");}
else{
Console.Write("Strings are equal");}
}
}
[
Serializable] public class TestObj{
public string txt;}
}
This test reproduces the error. If you step through this code with the debugger you will see that the string read back in has \r\n chars intact. after deserialization however, it only has \n. This is causing a problem for me at the moment because i am saving some user input from a text box and when it is loaded again it comes back incorrectly formatted. As i said before I am using sql server 2000 and the xml blob is stored in a text field. Also the sql sp expects a text paramater so there is no need for any conversion functions within the sp or my ADO.NET code. Database collation is set to Latin1_General_CI_AS however, i think we have eliminated teh db from the problem with the above code.
Any Ideas?
The trick is in looking at the XML to which your class gets serialized. You will find that the "txt" field gets serialized as the value of an element called "txt". The same happens if you use the [XmlText] attribute to cause it to be serialized as a text node.
It gets serialized as you specified it, with \r\n. However, in XML, \r\n is just whitespace, and is equivalent to \n. This is what it gets turned into when it is deserialized.
I think you may need to implement IXmlSerializable and to write "txt" in a CDATA section. That way, it should be deserialized character for character as you wrote it.
Alternatively, you could serialize your string as binary:
Code Snippet[Serializable]
public class TestObj
{
[XmlIgnore]
public string txt;
[XmlElement(ElementName = "txt", DataType = "base64Binary")]
public byte[] txtSerialization
{
get
{
return System.Text.Encoding.Unicode.GetBytes(txt);
}
set
{
txt = System.Text.Encoding.Unicode.GetString(value);
}
}
}
John,
I will look in to implementing IXmlSerializable as binary blobs are probably not an option for me. I can understand white space being omitted/stripped or converted for the actual xml elements but as for the actual data they hold, this shouldnt be modified. Surely this is a bug in the XmlSerializer as this is altering persisted data?
ImranOh! I see the problem.
Don't use StreamReader... use XmlTextReader.
Here's the deserialize method with XmlTextReader...
Code Snippet
private static void deserialize(){
XmlSerializer xsz = new XmlSerializer(typeof(TestObj)); TestObj to = null;using (XmlTextReader sr = new XmlTextReader(@"C:\TestFile.txt"))
{
to = (TestObj)xsz.Deserialize(sr);
}
if (to.txt != orig)
{
Console.Write("Strings are not equal");
}
else
{
Console.Write("Strings are equal");}
}
If you already have a StreamReader, and want to use it.. you can construct XmlTextReader with a stream instead of a filepath/url.
Hope that helps,
Troy
Go ahead and try the XmlReader, but if you're using .NET 2.0 or above, you should use:
Code Snippetusing (XmlReader sr = XmlReader.Create(@"C:\TestFile.txt"))
instead.
XmlReader also removes the carriage return characters... XmlTextReader preserves them.
XmlTextReader on MSDN
http://msdn2.microsoft.com/en-us/library/system.xml.xmltextreader.aspx
Thanks,
TroyCareful reading of the documentation on XmlTextReader does mention that it does not normalize text nodes. I'm not sure that this is a good thing in general. In general, you want text nodes normalized. In tihs particular case, for this particular element, you don't.
This appears to be a behavior of XmlReaders. Even when you specify a Stream or TextReader to DeSerialize, it internally creates and uses an XmlReader. It no doubt uses XmlReader.Create to create this internal XmlReader, and that is documented to normalize newlines.
So, this problem could be simplified to an example of just using an XmlReader to read from a document that has \r\n within a text value. The question would be whether the \r\n is returned when the data is read. The goal would be to find a setting of XmlReader that will preserve the \r\n.
With this example in hand, I'd suggest you experiment to see what happens if the element has xml: space='preserve'. If it correctly returns \r\n, then the trick would be to cause xml
pace='preserve' to be emitted on the <txt/> element.A search of MSDN found http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1323184&SiteID=1. Also see the other hits on http://search.msdn.microsoft.com/Default.aspx?brand=Msdn&refinement=00&locale=en-us&lang=en-us&query=xmlserializer%20whitespace%20preserve.
Arggg... I typed a lengthy explanation, only to find that when I hit "post" I had been logged out because my session expired. So it didn't post, and wasn't there when I hit the back button.
I'll try to re-write the important parts quickly... I was probably too long winded in the first post anyway.
In short -- you can control the level of whitespace handling with the WhiteSpaceHandling property on XmlTextReader. It defaults to WhiteSpaceHandling.All which gives you a lot of extra whitespace you might not want... I often use it iwth WhiteSpaceHandling.Significant to avoid this. Just set the property after you instantiate...
If you dig around in Reflector you will see this is the only way to get it to work. Look at System.Xml.XmlTextReaderImpl. The GetWhiteSpaceType() method is what controls if it will pay attention to whitespace or not. This will only consider keeping the anything considered XmlNodeType.Whitespace or SignificantWhitespace (ie. newlines in your data) if the WhiteSpaceHandling value is either All or Signficant. Otherwise it pretends the whitespace doesn't exist, and returns XmlNodeType.None... which means it doesn't ever get to the method EatWhiteSpaces() which is where the Normalization value is checked before chomping the carriage return.
So, unfortunately... Using the xml
pace="preserve" attribute on individual nodes won't work due to the implementation of the parser. The only way it will get those carriage returns data is with Normalization turned off. I'd be interested to hear if anyone can find a way to make that work!
Hope that helps,
Troy
Troy,
In your Reflector research, did you determine whether there's a way to change the whitespace handling of the other derived XmlReader classes? I know that only XmlTextReader still has the WhitespaceHandling property, but do any of the other XmlReader implementations have a way to change the handling of whitespace? I'm talking .NET 2.0 and above, which is quite different from .NET 1.1.
From what I can tell all the other non-abstract XmlReader implementations use the XmlTextReader under the hood. They just put more layers of business logic on top of it.
None of the others seem to directly parse the text, but rather work with the abstracted objects produced by XmlTextReaderImpl's parsing stage.
Of course, I could be wrong about that... There's a lot of code to look through.
Also interesting to note that if you create the XmlReader via XmlReader.Create, and pass a XmlReaderSettings object, that the XmlReaderSettings object only allows you to control it's handling of "insignificant" whitespace via the IgnoreWhitespace property.. It doesn't have a property for controlling normalization, as the XmlTextReader does.
Also interesting is that all the concrete implementations use the wrapper pattern, instead of an inheritance pattern, so even though they are all XmlReaders, and they all use a XmlTextReader under-the-hood, they use it by keeping XmlTextReader instance in a private field, rather than inheriting from it. I'm not sure what the motivation was behind that, but it means that you can't downcast the other concrete implementations to XmlTextReader, only to XmlReader... which is more limited.
Anyhow, for the OP's original example.. you probably want code that looks like this:
Code SnippetXmlSerializer
xsz = new XmlSerializer(typeof(ComparisonOptions));Options
copt = null;using
(SqlConnection con = new SqlConnection(overlordDB))using
(SqlCommand cmd = new SqlCommand("usp_GetRecord", con)){
"@id", id);cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.AddWithValue(
con.Open();
using (SqlDataReader rdr = cmd.ExecuteReader()){
Debug.Assert(rdr.RecordsAffected != 1);//get the first recrod, we only expect one.
rdr.Read();
using (StringReader str = new StringReader(Convert.ToString(rdr["Options"])))
using (XmlTextReader xtr = new XmlTextReader(str))
{
xtr.WhitespaceHandling = WhitespaceHandling.Significant;
copt = (ComparisonOptions)xsz.Deserialize(xtr);
copt.ID = Convert.ToInt32(rdr["ID"]);
copt.Name = Convert.ToString(rdr["Name"]);
copt.Description = Convert.ToString(rdr["Description"]);
}
}
}
Hope that helps,
Troy
Hi Guys,
Wrapping the string reader in an xmltextreader seams to have done the trick. When you mentioned 'under the hood' I realised that the XmlSerializer.Deserialize method itself must be using XMLTextReader under the hood too. Reflector output confirms this. The overide of desirailize I was calling accepts a stream and creates an XMLTextReader with default options that wraps the provided stream. The overload you have suggested using above fixes the problem because we now explicitly specify the XMLTextReader options to use and manually wrap the stream before passing it in. For my particular problem it seems that setting XMLTextReader.Normalization to false does the trick.
Reflector output of XmlSerializer.Deserialize:
Code Snippetpublic object Deserialize(Stream stream) { XmlTextReader xmlReader = new XmlTextReader(stream); xmlReader.WhitespaceHandling = WhitespaceHandling.Significant; xmlReader.Normalization = true; xmlReader.XmlResolver = null; return this.Deserialize(xmlReader, (string) null); }
Thanks for your help guys.
Imran.
I'm glad you got that working.
If this does answer your question, you should mark one of thoward37's responses as an answer.


