Odeslat dotazOdeslat dotaz
 

OdpovědětXMLSerializer \r\n problem

  • 2. dubna 2008 14:31imran.a Uživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     

    Hi,

     

    I am using XmlSerializer to serialize an object to and from a database field. The target table has a text type field to hold the serialized object information (as an XML blob). I am using SQL Server 2000 and .Net 2.0. Most fields of the object serialize and deserialize fine except for strings. I have a string field storing some data that seems to come back slighty different. After debugging my code I have come to the conclusion that the XMLSerializer correctly writes strings to the database with \r\n intact but when deserializing, it returns strings with only \n. I have attached the code below.

     

     

    XmlSerializer xsz = new XmlSerializer(typeof(ComparisonOptions));

    Options copt = null;

    using (SqlConnection con = new SqlConnection(overlordDB))

    {

    using (SqlCommand cmd = new SqlCommand("usp_GetRecord", con))

    {

    cmd.CommandType = CommandType.StoredProcedure;

    cmd.Parameters.AddWithValue("@id", id);

    con.Open();

    using (SqlDataReader rdr = cmd.ExecuteReader())

    {

    Debug.Assert(rdr.RecordsAffected != 1);

    //get the first recrod, we only expect one.

    rdr.Read();

    StringReader str = new StringReader((string)rdr["Options"]); --This line seems to read the string fine

    copt = (ComparisonOptions)xsz.Deserialize(str); --After this copt string fields are missing \r.

    copt.ID = (int)rdr["ID"];

    copt.Name = (string)rdr["Name"];

    copt.Description = (string)rdr["Description"];

    }

    }

    }

     

    The Options object is being populated from the corresponding database text field. this object has a string field which is assigned via deserialization. I have run through the code with the debugger and at the point I have marked (StringReader str decleration and assignment) the string fields in the XML blob are all read in correctly from the database and have their \r\n intact. After deserializing in to copt, the string fields of copt have only \n.

     

    There doesent seem to be anything I can find in the forums or on google that describes a similar sort of problem.

     

    Any help would be greatly appreciated as I have hit a dead end with this.

     

    Imran

Odpovědi

  • 3. dubna 2008 23:47thoward37 Uživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     Odpovědět

    From what I can tell all the other non-abstract XmlReader implementations use the XmlTextReader under the hood. They just put more layers of business logic on top of it.

     

    None of the others seem to directly parse the text, but rather work with the abstracted objects produced by XmlTextReaderImpl's parsing stage.

     

    Of course, I could be wrong about that... There's a lot of code to look through.

     

    Also interesting to note that if you create the XmlReader via XmlReader.Create, and pass a XmlReaderSettings object, that the XmlReaderSettings object only allows you to control it's handling of "insignificant" whitespace via the IgnoreWhitespace property.. It doesn't have a property for controlling normalization, as the XmlTextReader does.

     

    Also interesting is that all the concrete implementations use the wrapper pattern, instead of an inheritance pattern, so even though they are all XmlReaders, and they all use a XmlTextReader under-the-hood, they use it by keeping XmlTextReader instance in a private field, rather than inheriting from it. I'm not sure what the motivation was behind that, but it means that you can't downcast the other concrete implementations to XmlTextReader, only to XmlReader... which is more limited.

     

     

    Anyhow, for the OP's original example.. you probably want code that looks like this:

     

    Code Snippet

     

    XmlSerializer xsz = new XmlSerializer(typeof(ComparisonOptions));

     

    Options copt = null;

     

    using (SqlConnection con = new SqlConnection(overlordDB))

    using (SqlCommand cmd = new SqlCommand("usp_GetRecord", con))

    {

    cmd.CommandType = CommandType.StoredProcedure;

    cmd.Parameters.AddWithValue("@id", id);

    con.Open();

     

    using (SqlDataReader rdr = cmd.ExecuteReader())

    {

    Debug.Assert(rdr.RecordsAffected != 1);

     

    //get the first recrod, we only expect one.

    rdr.Read();

     

    using (StringReader str = new StringReader(Convert.ToString(rdr["Options"])))

    using (XmlTextReader xtr = new XmlTextReader(str))

    {

    xtr.WhitespaceHandling = WhitespaceHandling.Significant;

     

    copt = (ComparisonOptions)xsz.Deserialize(xtr);

    copt.ID = Convert.ToInt32(rdr["ID"]);

    copt.Name = Convert.ToString(rdr["Name"]);

    copt.Description = Convert.ToString(rdr["Description"]);

    }

    }

    }

     

     

     

    Hope that helps,

    Troy

Všechny reakce

  • 2. dubna 2008 18:16Netwhiz Uživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     
    Hello,

    Are you using XmlWriter in your serialize function, if so I think you can use the XmlWriterSettings to persist your indents.

    Thanks
    Netwhiz
  • 2. dubna 2008 21:39thoward37 Uživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     

    To help isolate the problem, remove the database from the picture in a small test harness. Simply serialize the data to an xml string, then immediately deserialize it to an object again. See if you're missing the '\r' in the final object.

     

    If so, the problem is within the XmlSerializer settings. If not, then problem is related to database storage and retrieval.

     

    I regularly serialize and deserialize objects with string data the contains CR/LF, and I've never run into issue like that, and I don't use any special XmlSerializer configuration.. Just out of the box. I imagine it has something to do with the insert to the database, converting the line endings.

     

    What datatype is the DB field where this is being stored? Are you using a stored proc through a SqlCommand object for the insert as well, or using an dynamic query? If it's a stored proc, does the parameter of the stored proc for passing the xml data match the DB field type or are you using on the sql convert functions to get it to match?

     

    What is the collation setting for the database?

     

    Hope that helps,

    Troy

     

     

  • 3. dubna 2008 9:44imran.a Uživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     

    Hi Guys,

     

    Nope, I am not using XmlWriter I am using StringWriter. From what I can see the problem doesent seem to be perisisng the objects to the database as the \r\n control chars are intact at this point and it is demonstrated in the below test when you step through witht he debugger. The test below demonstrates my problem.

     

    using System;

    using System.IO;

    using System.Xml.Serialization;

    using System.Xml;

    namespace TestXmlSerializecrlf

    {

    internal class Program

    {

    static readonly string orig = @"multiline input

    another line here

    one more line";

    private static void Main(string[] args)

    {

    serialize();

    deserialize();

    Console.ReadLine();

    }

    private static void serialize()

    {

    //Serialize

    XmlSerializer xsz = new XmlSerializer(typeof (TestObj));

    StringWriter sWriter = new StringWriter();

    TestObj to = new TestObj();

    to.txt = orig;

    xsz.Serialize(sWriter, to);

    //write

    using (StreamWriter sw = new StreamWriter(@"C:\TestFile.txt", false))

    {

    sw.Write(sWriter.ToString());

    }

    }

    private static void deserialize()

    {

    //Deserialize

    XmlSerializer xsz = new XmlSerializer(typeof (TestObj));

    TestObj to = null;

    using (StreamReader sr = new StreamReader(@"C:\TestFile.txt"))

    {

    StringReader str = new StringReader(sr.ReadToEnd()); --Here the string is intact

    to = (TestObj) xsz.Deserialize(str); --Here it isnt

    }

    if (to.txt != orig)

    {

    Console.Write("Strings are not equal");

    }

    else

    {

    Console.Write("Strings are equal");

    }

    }

    }

    [Serializable]

    public class TestObj

    {

    public string txt;

    }

    }

     

    This test reproduces the error. If you step through this code with the debugger you will see that the string read back in has \r\n chars intact. after deserialization however, it only has \n. This is causing a problem for me at the moment because i am saving some user input from a text box and when it is loaded again it comes back incorrectly formatted. As i said before I am using sql server 2000 and the xml blob is stored in a text field. Also the sql sp expects a text paramater so there is no need for any conversion functions within the sp or my ADO.NET code. Database collation is set to Latin1_General_CI_AS however, i think we have eliminated teh db from the problem with the above code.

     

    Any Ideas?

  • 3. dubna 2008 14:13John SaundersMVP, ModerátorUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     

    The trick is in looking at the XML to which your class gets serialized. You will find that the "txt" field gets serialized as the value of an element called "txt". The same happens if you use the [XmlText] attribute to cause it to be serialized as a text node.

     

    It gets serialized as you specified it, with \r\n. However, in XML, \r\n is just whitespace, and is equivalent to \n. This is what it gets turned into when it is deserialized.

     

    I think you may need to implement IXmlSerializable and to write "txt" in a CDATA section. That way, it should be deserialized character for character as you wrote it.

     

    Alternatively, you could serialize your string as binary:

     

    Code Snippet

    [Serializable]

    public class TestObj

    {

        [XmlIgnore]

        public string txt;

     

        [XmlElement(ElementName = "txt", DataType = "base64Binary")]

        public byte[] txtSerialization

        {

            get

            {

                return System.Text.Encoding.Unicode.GetBytes(txt);

            }

            set

            {

                txt = System.Text.Encoding.Unicode.GetString(value);

            }

        }

    }

     

     

     

  • 3. dubna 2008 16:50imran.a Uživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     

    John,

    I will look in to implementing IXmlSerializable as binary blobs are probably not an option for me. I can understand white space being omitted/stripped or converted for the actual xml elements but as for the actual data they hold, this shouldnt be modified. Surely this is a bug in the XmlSerializer as this is altering persisted data?

     

    Imran
  • 3. dubna 2008 17:16thoward37 Uživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     

    Oh! I see the problem.

     

    Don't use StreamReader... use XmlTextReader.

     

    Here's the deserialize method with XmlTextReader...

     

    Code Snippet

     

    private static void deserialize()

    {

    XmlSerializer xsz = new XmlSerializer(typeof(TestObj));

    TestObj to = null;

     

    using (XmlTextReader sr = new XmlTextReader(@"C:\TestFile.txt"))

    {

    to = (TestObj)xsz.Deserialize(sr);

    }

     

    if (to.txt != orig)

    {

    Console.Write("Strings are not equal");

    }

    else

    {

    Console.Write("Strings are equal");

    }

    }

     

     

     

    If you already have a StreamReader, and want to use it.. you can construct XmlTextReader with a stream instead of a filepath/url.

     

    Hope that helps,

    Troy

     

  • 3. dubna 2008 17:19John SaundersMVP, ModerátorUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     

    Go ahead and try the XmlReader, but if you're using .NET 2.0 or above, you should use:

     

    Code Snippet

    using (XmlReader sr = XmlReader.Create(@"C:\TestFile.txt"))

     

     

    instead.

     

  • 3. dubna 2008 17:25thoward37 Uživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     

    XmlReader also removes the carriage return characters... XmlTextReader preserves them.

     

    XmlTextReader on MSDN

    http://msdn2.microsoft.com/en-us/library/system.xml.xmltextreader.aspx

     

    Thanks,
    Troy

     

     

     

  • 3. dubna 2008 19:15John SaundersMVP, ModerátorUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     

    Careful reading of the documentation on XmlTextReader does mention that it does not normalize text nodes. I'm not sure that this is a good thing in general. In general, you want text nodes normalized. In tihs particular case, for this particular element, you don't.

     

    This appears to be a behavior of XmlReaders. Even when you specify a Stream or TextReader to DeSerialize, it internally creates and uses an XmlReader. It no doubt uses XmlReader.Create to create this internal XmlReader, and that is documented to normalize newlines.

     

    So, this problem could be simplified to an example of just using an XmlReader to read from a document that has \r\n within a text value. The question would be whether the \r\n is returned when the data is read. The goal would be to find a setting of XmlReader that will preserve the \r\n.

     

    With this example in hand, I'd suggest you experiment to see what happens if the element has xml: space='preserve'.  If it correctly returns \r\n, then the trick would be to cause xmlTongue Tiedpace='preserve' to be emitted on the <txt/> element.

  • 3. dubna 2008 19:34John SaundersMVP, ModerátorUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     
  • 3. dubna 2008 21:03thoward37 Uživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     

    Arggg... I typed a lengthy explanation, only to find that when I hit "post" I had been logged out because my session expired. So it didn't post, and wasn't there when I hit the back button.

     

    I'll try to re-write the important parts quickly... I was probably too long winded in the first post anyway.

     

    In short -- you can control the level of whitespace handling with the WhiteSpaceHandling property on XmlTextReader. It defaults to WhiteSpaceHandling.All which gives you a lot of extra whitespace you might not want... I often use it iwth WhiteSpaceHandling.Significant to avoid this. Just set the property after you instantiate...

     

    If you dig around in Reflector you will see this is the only way to get it to work. Look at System.Xml.XmlTextReaderImpl. The GetWhiteSpaceType() method is what controls if it will pay attention to whitespace or not. This will only consider keeping the anything considered XmlNodeType.Whitespace or SignificantWhitespace (ie. newlines in your data) if the WhiteSpaceHandling value is either All or Signficant. Otherwise it pretends the whitespace doesn't exist, and returns XmlNodeType.None... which means it doesn't ever get to the method EatWhiteSpaces() which is where the Normalization value is checked before chomping the carriage return.

     

    So, unfortunately... Using the xmlTongue Tiedpace="preserve" attribute on individual nodes won't work due to the implementation of the parser. The only way it will get those carriage returns data is with Normalization turned off.

     

    I'd be interested to hear if anyone can find a way to make that work!

     

    Hope that helps,

    Troy

  • 3. dubna 2008 21:06John SaundersMVP, ModerátorUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     

    Troy,

     

    In your Reflector research, did you determine whether there's a way to change the whitespace handling of the other derived XmlReader classes? I know that only XmlTextReader still has the WhitespaceHandling property, but do any of the other XmlReader implementations have a way to change the handling of whitespace? I'm talking .NET 2.0 and above, which is quite different from .NET 1.1.

     

  • 3. dubna 2008 23:47thoward37 Uživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     Odpovědět

    From what I can tell all the other non-abstract XmlReader implementations use the XmlTextReader under the hood. They just put more layers of business logic on top of it.

     

    None of the others seem to directly parse the text, but rather work with the abstracted objects produced by XmlTextReaderImpl's parsing stage.

     

    Of course, I could be wrong about that... There's a lot of code to look through.

     

    Also interesting to note that if you create the XmlReader via XmlReader.Create, and pass a XmlReaderSettings object, that the XmlReaderSettings object only allows you to control it's handling of "insignificant" whitespace via the IgnoreWhitespace property.. It doesn't have a property for controlling normalization, as the XmlTextReader does.

     

    Also interesting is that all the concrete implementations use the wrapper pattern, instead of an inheritance pattern, so even though they are all XmlReaders, and they all use a XmlTextReader under-the-hood, they use it by keeping XmlTextReader instance in a private field, rather than inheriting from it. I'm not sure what the motivation was behind that, but it means that you can't downcast the other concrete implementations to XmlTextReader, only to XmlReader... which is more limited.

     

     

    Anyhow, for the OP's original example.. you probably want code that looks like this:

     

    Code Snippet

     

    XmlSerializer xsz = new XmlSerializer(typeof(ComparisonOptions));

     

    Options copt = null;

     

    using (SqlConnection con = new SqlConnection(overlordDB))

    using (SqlCommand cmd = new SqlCommand("usp_GetRecord", con))

    {

    cmd.CommandType = CommandType.StoredProcedure;

    cmd.Parameters.AddWithValue("@id", id);

    con.Open();

     

    using (SqlDataReader rdr = cmd.ExecuteReader())

    {

    Debug.Assert(rdr.RecordsAffected != 1);

     

    //get the first recrod, we only expect one.

    rdr.Read();

     

    using (StringReader str = new StringReader(Convert.ToString(rdr["Options"])))

    using (XmlTextReader xtr = new XmlTextReader(str))

    {

    xtr.WhitespaceHandling = WhitespaceHandling.Significant;

     

    copt = (ComparisonOptions)xsz.Deserialize(xtr);

    copt.ID = Convert.ToInt32(rdr["ID"]);

    copt.Name = Convert.ToString(rdr["Name"]);

    copt.Description = Convert.ToString(rdr["Description"]);

    }

    }

    }

     

     

     

    Hope that helps,

    Troy

  • 4. dubna 2008 11:08imran.a Uživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     

    Hi Guys,

     

    Wrapping the string reader in an xmltextreader seams to have done the trick. When you mentioned 'under the hood' I realised that the XmlSerializer.Deserialize method itself must be using XMLTextReader under the hood too. Reflector output confirms this. The overide of desirailize I was calling accepts a stream and creates an XMLTextReader with default options that wraps the provided stream. The overload you have suggested using above fixes the problem because we now explicitly specify the XMLTextReader options to use and manually wrap the stream before passing it in. For my particular problem it seems that setting XMLTextReader.Normalization to false does the trick.

     

    Reflector output of XmlSerializer.Deserialize:

     

    Thanks for your help guys.
    Imran.
  • 4. dubna 2008 11:17John SaundersMVP, ModerátorUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     

    I'm glad you got that working.

     

    If this does answer your question, you should mark one of thoward37's responses as an answer.