MSDN > 論壇首頁 > ASMX Web Services and XML Serialization > XMLSerializer \r\n problem
發問發問
 

已答覆XMLSerializer \r\n problem

  • 2008年4月2日 下午 02:31imran.a 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    Hi,

     

    I am using XmlSerializer to serialize an object to and from a database field. The target table has a text type field to hold the serialized object information (as an XML blob). I am using SQL Server 2000 and .Net 2.0. Most fields of the object serialize and deserialize fine except for strings. I have a string field storing some data that seems to come back slighty different. After debugging my code I have come to the conclusion that the XMLSerializer correctly writes strings to the database with \r\n intact but when deserializing, it returns strings with only \n. I have attached the code below.

     

     

    XmlSerializer xsz = new XmlSerializer(typeof(ComparisonOptions));

    Options copt = null;

    using (SqlConnection con = new SqlConnection(overlordDB))

    {

    using (SqlCommand cmd = new SqlCommand("usp_GetRecord", con))

    {

    cmd.CommandType = CommandType.StoredProcedure;

    cmd.Parameters.AddWithValue("@id", id);

    con.Open();

    using (SqlDataReader rdr = cmd.ExecuteReader())

    {

    Debug.Assert(rdr.RecordsAffected != 1);

    //get the first recrod, we only expect one.

    rdr.Read();

    StringReader str = new StringReader((string)rdr["Options"]); --This line seems to read the string fine

    copt = (ComparisonOptions)xsz.Deserialize(str); --After this copt string fields are missing \r.

    copt.ID = (int)rdr["ID"];

    copt.Name = (string)rdr["Name"];

    copt.Description = (string)rdr["Description"];

    }

    }

    }

     

    The Options object is being populated from the corresponding database text field. this object has a string field which is assigned via deserialization. I have run through the code with the debugger and at the point I have marked (StringReader str decleration and assignment) the string fields in the XML blob are all read in correctly from the database and have their \r\n intact. After deserializing in to copt, the string fields of copt have only \n.

     

    There doesent seem to be anything I can find in the forums or on google that describes a similar sort of problem.

     

    Any help would be greatly appreciated as I have hit a dead end with this.

     

    Imran

解答

  • 2008年4月3日 下午 11:47thoward37 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     已答覆

    From what I can tell all the other non-abstract XmlReader implementations use the XmlTextReader under the hood. They just put more layers of business logic on top of it.

     

    None of the others seem to directly parse the text, but rather work with the abstracted objects produced by XmlTextReaderImpl's parsing stage.

     

    Of course, I could be wrong about that... There's a lot of code to look through.

     

    Also interesting to note that if you create the XmlReader via XmlReader.Create, and pass a XmlReaderSettings object, that the XmlReaderSettings object only allows you to control it's handling of "insignificant" whitespace via the IgnoreWhitespace property.. It doesn't have a property for controlling normalization, as the XmlTextReader does.

     

    Also interesting is that all the concrete implementations use the wrapper pattern, instead of an inheritance pattern, so even though they are all XmlReaders, and they all use a XmlTextReader under-the-hood, they use it by keeping XmlTextReader instance in a private field, rather than inheriting from it. I'm not sure what the motivation was behind that, but it means that you can't downcast the other concrete implementations to XmlTextReader, only to XmlReader... which is more limited.

     

     

    Anyhow, for the OP's original example.. you probably want code that looks like this:

     

    Code Snippet

     

    XmlSerializer xsz = new XmlSerializer(typeof(ComparisonOptions));

     

    Options copt = null;

     

    using (SqlConnection con = new SqlConnection(overlordDB))

    using (SqlCommand cmd = new SqlCommand("usp_GetRecord", con))

    {

    cmd.CommandType = CommandType.StoredProcedure;

    cmd.Parameters.AddWithValue("@id", id);

    con.Open();

     

    using (SqlDataReader rdr = cmd.ExecuteReader())

    {

    Debug.Assert(rdr.RecordsAffected != 1);

     

    //get the first recrod, we only expect one.

    rdr.Read();

     

    using (StringReader str = new StringReader(Convert.ToString(rdr["Options"])))

    using (XmlTextReader xtr = new XmlTextReader(str))

    {

    xtr.WhitespaceHandling = WhitespaceHandling.Significant;

     

    copt = (ComparisonOptions)xsz.Deserialize(xtr);

    copt.ID = Convert.ToInt32(rdr["ID"]);

    copt.Name = Convert.ToString(rdr["Name"]);

    copt.Description = Convert.ToString(rdr["Description"]);

    }

    }

    }

     

     

     

    Hope that helps,

    Troy

所有回覆

  • 2008年4月2日 下午 06:16Netwhiz 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     
    Hello,

    Are you using XmlWriter in your serialize function, if so I think you can use the XmlWriterSettings to persist your indents.

    Thanks
    Netwhiz
  • 2008年4月2日 下午 09:39thoward37 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    To help isolate the problem, remove the database from the picture in a small test harness. Simply serialize the data to an xml string, then immediately deserialize it to an object again. See if you're missing the '\r' in the final object.

     

    If so, the problem is within the XmlSerializer settings. If not, then problem is related to database storage and retrieval.

     

    I regularly serialize and deserialize objects with string data the contains CR/LF, and I've never run into issue like that, and I don't use any special XmlSerializer configuration.. Just out of the box. I imagine it has something to do with the insert to the database, converting the line endings.

     

    What datatype is the DB field where this is being stored? Are you using a stored proc through a SqlCommand object for the insert as well, or using an dynamic query? If it's a stored proc, does the parameter of the stored proc for passing the xml data match the DB field type or are you using on the sql convert functions to get it to match?

     

    What is the collation setting for the database?

     

    Hope that helps,

    Troy

     

     

  • 2008年4月3日 上午 09:44imran.a 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    Hi Guys,

     

    Nope, I am not using XmlWriter I am using StringWriter. From what I can see the problem doesent seem to be perisisng the objects to the database as the \r\n control chars are intact at this point and it is demonstrated in the below test when you step through witht he debugger. The test below demonstrates my problem.

     

    using System;

    using System.IO;

    using System.Xml.Serialization;

    using System.Xml;

    namespace TestXmlSerializecrlf

    {

    internal class Program

    {

    static readonly string orig = @"multiline input

    another line here

    one more line";

    private static void Main(string[] args)

    {

    serialize();

    deserialize();

    Console.ReadLine();

    }

    private static void serialize()

    {

    //Serialize

    XmlSerializer xsz = new XmlSerializer(typeof (TestObj));

    StringWriter sWriter = new StringWriter();

    TestObj to = new TestObj();

    to.txt = orig;

    xsz.Serialize(sWriter, to);

    //write

    using (StreamWriter sw = new StreamWriter(@"C:\TestFile.txt", false))

    {

    sw.Write(sWriter.ToString());

    }

    }

    private static void deserialize()

    {

    //Deserialize

    XmlSerializer xsz = new XmlSerializer(typeof (TestObj));

    TestObj to = null;

    using (StreamReader sr = new StreamReader(@"C:\TestFile.txt"))

    {

    StringReader str = new StringReader(sr.ReadToEnd()); --Here the string is intact

    to = (TestObj) xsz.Deserialize(str); --Here it isnt

    }

    if (to.txt != orig)

    {

    Console.Write("Strings are not equal");

    }

    else

    {

    Console.Write("Strings are equal");

    }

    }

    }

    [Serializable]

    public class TestObj

    {

    public string txt;

    }

    }

     

    This test reproduces the error. If you step through this code with the debugger you will see that the string read back in has \r\n chars intact. after deserialization however, it only has \n. This is causing a problem for me at the moment because i am saving some user input from a text box and when it is loaded again it comes back incorrectly formatted. As i said before I am using sql server 2000 and the xml blob is stored in a text field. Also the sql sp expects a text paramater so there is no need for any conversion functions within the sp or my ADO.NET code. Database collation is set to Latin1_General_CI_AS however, i think we have eliminated teh db from the problem with the above code.

     

    Any Ideas?

  • 2008年4月3日 下午 02:13John SaundersMVP, 版主使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    The trick is in looking at the XML to which your class gets serialized. You will find that the "txt" field gets serialized as the value of an element called "txt". The same happens if you use the [XmlText] attribute to cause it to be serialized as a text node.

     

    It gets serialized as you specified it, with \r\n. However, in XML, \r\n is just whitespace, and is equivalent to \n. This is what it gets turned into when it is deserialized.

     

    I think you may need to implement IXmlSerializable and to write "txt" in a CDATA section. That way, it should be deserialized character for character as you wrote it.

     

    Alternatively, you could serialize your string as binary:

     

    Code Snippet

    [Serializable]

    public class TestObj

    {

        [XmlIgnore]

        public string txt;

     

        [XmlElement(ElementName = "txt", DataType = "base64Binary")]

        public byte[] txtSerialization

        {

            get

            {

                return System.Text.Encoding.Unicode.GetBytes(txt);

            }

            set

            {

                txt = System.Text.Encoding.Unicode.GetString(value);

            }

        }

    }

     

     

     

  • 2008年4月3日 下午 04:50imran.a 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    John,

    I will look in to implementing IXmlSerializable as binary blobs are probably not an option for me. I can understand white space being omitted/stripped or converted for the actual xml elements but as for the actual data they hold, this shouldnt be modified. Surely this is a bug in the XmlSerializer as this is altering persisted data?

     

    Imran
  • 2008年4月3日 下午 05:16thoward37 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    Oh! I see the problem.

     

    Don't use StreamReader... use XmlTextReader.

     

    Here's the deserialize method with XmlTextReader...

     

    Code Snippet

     

    private static void deserialize()

    {

    XmlSerializer xsz = new XmlSerializer(typeof(TestObj));

    TestObj to = null;

     

    using (XmlTextReader sr = new XmlTextReader(@"C:\TestFile.txt"))

    {

    to = (TestObj)xsz.Deserialize(sr);

    }

     

    if (to.txt != orig)

    {

    Console.Write("Strings are not equal");

    }

    else

    {

    Console.Write("Strings are equal");

    }

    }

     

     

     

    If you already have a StreamReader, and want to use it.. you can construct XmlTextReader with a stream instead of a filepath/url.

     

    Hope that helps,

    Troy

     

  • 2008年4月3日 下午 05:19John SaundersMVP, 版主使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    Go ahead and try the XmlReader, but if you're using .NET 2.0 or above, you should use:

     

    Code Snippet

    using (XmlReader sr = XmlReader.Create(@"C:\TestFile.txt"))

     

     

    instead.

     

  • 2008年4月3日 下午 05:25thoward37 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    XmlReader also removes the carriage return characters... XmlTextReader preserves them.

     

    XmlTextReader on MSDN

    http://msdn2.microsoft.com/en-us/library/system.xml.xmltextreader.aspx

     

    Thanks,
    Troy

     

     

     

  • 2008年4月3日 下午 07:15John SaundersMVP, 版主使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    Careful reading of the documentation on XmlTextReader does mention that it does not normalize text nodes. I'm not sure that this is a good thing in general. In general, you want text nodes normalized. In tihs particular case, for this particular element, you don't.

     

    This appears to be a behavior of XmlReaders. Even when you specify a Stream or TextReader to DeSerialize, it internally creates and uses an XmlReader. It no doubt uses XmlReader.Create to create this internal XmlReader, and that is documented to normalize newlines.

     

    So, this problem could be simplified to an example of just using an XmlReader to read from a document that has \r\n within a text value. The question would be whether the \r\n is returned when the data is read. The goal would be to find a setting of XmlReader that will preserve the \r\n.

     

    With this example in hand, I'd suggest you experiment to see what happens if the element has xml: space='preserve'.  If it correctly returns \r\n, then the trick would be to cause xmlTongue Tiedpace='preserve' to be emitted on the <txt/> element.

  • 2008年4月3日 下午 07:34John SaundersMVP, 版主使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     
  • 2008年4月3日 下午 09:03thoward37 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    Arggg... I typed a lengthy explanation, only to find that when I hit "post" I had been logged out because my session expired. So it didn't post, and wasn't there when I hit the back button.

     

    I'll try to re-write the important parts quickly... I was probably too long winded in the first post anyway.

     

    In short -- you can control the level of whitespace handling with the WhiteSpaceHandling property on XmlTextReader. It defaults to WhiteSpaceHandling.All which gives you a lot of extra whitespace you might not want... I often use it iwth WhiteSpaceHandling.Significant to avoid this. Just set the property after you instantiate...

     

    If you dig around in Reflector you will see this is the only way to get it to work. Look at System.Xml.XmlTextReaderImpl. The GetWhiteSpaceType() method is what controls if it will pay attention to whitespace or not. This will only consider keeping the anything considered XmlNodeType.Whitespace or SignificantWhitespace (ie. newlines in your data) if the WhiteSpaceHandling value is either All or Signficant. Otherwise it pretends the whitespace doesn't exist, and returns XmlNodeType.None... which means it doesn't ever get to the method EatWhiteSpaces() which is where the Normalization value is checked before chomping the carriage return.

     

    So, unfortunately... Using the xmlTongue Tiedpace="preserve" attribute on individual nodes won't work due to the implementation of the parser. The only way it will get those carriage returns data is with Normalization turned off.

     

    I'd be interested to hear if anyone can find a way to make that work!

     

    Hope that helps,

    Troy

  • 2008年4月3日 下午 09:06John SaundersMVP, 版主使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    Troy,

     

    In your Reflector research, did you determine whether there's a way to change the whitespace handling of the other derived XmlReader classes? I know that only XmlTextReader still has the WhitespaceHandling property, but do any of the other XmlReader implementations have a way to change the handling of whitespace? I'm talking .NET 2.0 and above, which is quite different from .NET 1.1.

     

  • 2008年4月3日 下午 11:47thoward37 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     已答覆

    From what I can tell all the other non-abstract XmlReader implementations use the XmlTextReader under the hood. They just put more layers of business logic on top of it.

     

    None of the others seem to directly parse the text, but rather work with the abstracted objects produced by XmlTextReaderImpl's parsing stage.

     

    Of course, I could be wrong about that... There's a lot of code to look through.

     

    Also interesting to note that if you create the XmlReader via XmlReader.Create, and pass a XmlReaderSettings object, that the XmlReaderSettings object only allows you to control it's handling of "insignificant" whitespace via the IgnoreWhitespace property.. It doesn't have a property for controlling normalization, as the XmlTextReader does.

     

    Also interesting is that all the concrete implementations use the wrapper pattern, instead of an inheritance pattern, so even though they are all XmlReaders, and they all use a XmlTextReader under-the-hood, they use it by keeping XmlTextReader instance in a private field, rather than inheriting from it. I'm not sure what the motivation was behind that, but it means that you can't downcast the other concrete implementations to XmlTextReader, only to XmlReader... which is more limited.

     

     

    Anyhow, for the OP's original example.. you probably want code that looks like this:

     

    Code Snippet

     

    XmlSerializer xsz = new XmlSerializer(typeof(ComparisonOptions));

     

    Options copt = null;

     

    using (SqlConnection con = new SqlConnection(overlordDB))

    using (SqlCommand cmd = new SqlCommand("usp_GetRecord", con))

    {

    cmd.CommandType = CommandType.StoredProcedure;

    cmd.Parameters.AddWithValue("@id", id);

    con.Open();

     

    using (SqlDataReader rdr = cmd.ExecuteReader())

    {

    Debug.Assert(rdr.RecordsAffected != 1);

     

    //get the first recrod, we only expect one.

    rdr.Read();

     

    using (StringReader str = new StringReader(Convert.ToString(rdr["Options"])))

    using (XmlTextReader xtr = new XmlTextReader(str))

    {

    xtr.WhitespaceHandling = WhitespaceHandling.Significant;

     

    copt = (ComparisonOptions)xsz.Deserialize(xtr);

    copt.ID = Convert.ToInt32(rdr["ID"]);

    copt.Name = Convert.ToString(rdr["Name"]);

    copt.Description = Convert.ToString(rdr["Description"]);

    }

    }

    }

     

     

     

    Hope that helps,

    Troy

  • 2008年4月4日 上午 11:08imran.a 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    Hi Guys,

     

    Wrapping the string reader in an xmltextreader seams to have done the trick. When you mentioned 'under the hood' I realised that the XmlSerializer.Deserialize method itself must be using XMLTextReader under the hood too. Reflector output confirms this. The overide of desirailize I was calling accepts a stream and creates an XMLTextReader with default options that wraps the provided stream. The overload you have suggested using above fixes the problem because we now explicitly specify the XMLTextReader options to use and manually wrap the stream before passing it in. For my particular problem it seems that setting XMLTextReader.Normalization to false does the trick.

     

    Reflector output of XmlSerializer.Deserialize:

     

    Thanks for your help guys.
    Imran.
  • 2008年4月4日 上午 11:17John SaundersMVP, 版主使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    I'm glad you got that working.

     

    If this does answer your question, you should mark one of thoward37's responses as an answer.