locked
DataContractSerializer deserialization puzzle - by-pass member initialization. RRS feed

  • Question

  • Hi,

    I was using WCF and during the transmission of a object, of the type Book, see code below, the deserialization process throws a NullReferenceException. To isolate the problem, I uses a unit test like this rather than involving the full blown WCF marshalling. This appears to be solely a serialization/deserialization problem and has nothing to do with the communication issues in WCF.

    [Test]
    public void TestSerializeAndDeserializeBook()
    {
        Book book = new Book( "Testing Serialization and Deserialization",
                            new Person( 100, "Tom", "Smith" ),
                            new Person( 101, "Peter", "Pan" ) );
        DataContractSerializer ser = new DataContractSerializer( typeof(Book) );
        using( Stream stm = new MemoryStream() )
        {
            ser.WriteObject( stm, book );
            stm.Seek( 0, SeekOrigin.Begin );
            Book regen = ser.ReadObject( stm ) as Book; // NullReferenceException here
            AssertBook( book, regen );
        }
    }

    These are the descriptions of the Book and Person class:
    namespace Bookshop
    {
        [ DataContract ]
        public class Book {
            string title;
            List<Person> authors = new List<Person>();
          
            public Book( string title, params Person[] author ) {
                this.title = title;
                this.authors.AddRange( author );
            }
          
            [ DataMember (Name="ti") ]
            public String Title {
                get{ return this.title; }
                set{ this.title = value; }
            }
          
            [ DataMember ( Name="au" )]
            public Person[] Authors    {
                get{ return this.authors.ToArray(); }
                set
                {
                    //if ( this.authors == null )
                    //    this.authors = new List<Person>();
                    //else
                        this.authors.Clear();  // Why this.authors == null?
                    this.authors.AddRange( value );
                }
            }
        }
      
        //=========================================================
        [ DataContract ]
        public class Person     {
            string firstName;
            string lastName;
            Int32 id;
            public Person( Int32 id, string firstName, string lastName ) {
                this.id = id;
                this.firstName = firstName;
                this.lastName = lastName;
            }
          
            [ DataMember ]
            public Int32 Id {
                get{ return this.id; }
                set{ this.id = value; }
            }
          
            [ DataMember ( Name="fn" ) ]
            public String FirstName    {
                get{ return this.firstName; }
                set{ this.firstName = value; }
            }
          
            [ DataMember (Name="ln") ]
            public String LastName {
                get { return this.lastName; }
                set { this.lastName = value; }
            }
        }

    }

    Questions:
    1) What kind of construction technique does DataContractSerializer uses during the deserialization that can by-pass the member (Book.authors) initialization? To successfully run through my unit test, I have to uncomment the code in the setting of Book.Authors. Weird.
    2) This problem on surface if I use DataContractSerializer that sets the array of Person. If I construct a Book object and then assign an array of Person to Book.Authors, it works fine. The Book.authors has already been constructed as one expects.

    What have I done wrong?

    Thanks for any suggestion in advance.

    John

    Tuesday, March 9, 2010 5:46 AM

Answers

  • The serialization formatter gets uninitialized instances of classes during deserialization. That is, instances where all fields are set to their default values. For reference types this will be null. That is why "authors" in this case causes a null reference exception. You have to create it in the property like the code you have commented out. By including this "lazy" initialization code for authors you can remove the field initializer. Also, you must change the constructor to use the Property and not the field direclty.

    /Calle
    - Still confused, but on a higher level -
    • Marked as answer by JohnyM Wednesday, March 10, 2010 1:42 PM
    Wednesday, March 10, 2010 12:30 AM

All replies

  • The serialization formatter gets uninitialized instances of classes during deserialization. That is, instances where all fields are set to their default values. For reference types this will be null. That is why "authors" in this case causes a null reference exception. You have to create it in the property like the code you have commented out. By including this "lazy" initialization code for authors you can remove the field initializer. Also, you must change the constructor to use the Property and not the field direclty.

    /Calle
    - Still confused, but on a higher level -
    • Marked as answer by JohnyM Wednesday, March 10, 2010 1:42 PM
    Wednesday, March 10, 2010 12:30 AM
  • The serialization formatter gets uninitialized instances of classes during deserialization. 

    Thanks for responding and offering the explanation. That makes sense because there are two phases in an object creation - as you said all variables are initialized to their default values and then variables initializers, if any, are executed in their textual order in the second phase. DataContractSerializer with some magic abruptly stopped the second phase construction.

    Your answer stirs my curiosity further. Do you know the .Net construct that allows one to create essentially a partially constructed object in which variable initializers are not executed? It may come in handy in the future.

    While this may sound like a good optimization technique but it violates the normal object creation behavior where one can safely relies the constructor being called all the time.

    Perhaps you can enlighten me on where else in .Net library is this partial object construction technique is used so that I can watch out for.

    I did fix the problem finally myself but not in the manner as you suggested.

    Not being disrespectful, fixing this class construction your recommended way is really odd. This is because variable initializers are there for a purpose and is documented in ECMA-CLI standard. Everybody take it for granted to be executed.

    So sprinkling object creation everywhere just simply asking for trouble, unless you are forced to use Authors property all the time. But the marshalling type chosen for Authors is designed for interoperability and hence returning Person[] rather than List<Person>. Hence if I want to do List operation on Persons internally I have to reference Book.authors and not Book.Authors. I could return List<Person> and let marshaller to choose Person[]. There are other practical reasons too.

    There is no problem with using variable initializer except that they become problematic when dealing with DataContractInitializer. Hence the proper way to fix this is to fix it with DataContractInitializer specifics construct or style. So in my case I modify Book like this:

    class Book {
      // existing code
     
     [OnDeserializing]
     private void OnDeserializing( StreamingContext context ) {
        if( this.authors == null )
           this.authors = new List<Person>();
       // Other variable initialization here to replace those tossed out
       // courtesy of DataContractSerializer.
      }

    }

    This style unambiguously says that they are only needed to deal with DataContractSerializer idiosyncrasy and nothing else. In fact if you examine the IL code for variable initialization, they are just a bunch of object construction and assignment IL code stashed together in the front of the .ctor.

    I just wish DataContractSerializer would carry a big warning on this highly unusual departure from standard object creation technique.

    Thanks for your contribution.


    Wednesday, March 10, 2010 1:52 PM
  • The serialization formatter gets uninitialized instances of classes during deserialization. 

    Thanks for responding and offering the explanation. That makes sense because there are two phases in an object creation - as you said all variables are initialized to their default values and then variables initializers, if any, are executed in their textual order in the second phase. DataContractSerializer with some magic abruptly stopped the second phase construction.

    Your answer stirs my curiosity further. Do you know the .Net construct that allows one to create essentially a partially constructed object in which variable initializers are not executed? It may come in handy in the future.

    While this may sound like a good optimization technique but it violates the normal object creation behavior where one can safely relies the constructor being called all the time.

    Perhaps you can enlighten e on where else in .Net library is this partial object construction technique is used so that I can watch out for.

    I did fix the problem finally myself but not in the manner as you suggested.

    Not being disrespectful, fixing this class construction your recommended way is really odd. This is because variable initializers are there for a purpose and is documented in ECMA-CLI standard. Everybody take it for granted to be executed.

    So sprinkling object creation everywhere just simply asking for trouble, unless you are forced to use Authors property all the time. But the marshalling type chosen for Authors is designed for interoperability and hence returning Person[] rather than List<PERSON>. Hence if I want to do List operation on Persons internally I have to reference Book.authors and not Book.Authors. I could return List<PERSON> and let marshaller to choose Person[]. There are other practical reasons too.

    There is no problem with using variable initializer except that they become problematic when dealing with DataContractInitializer. Hence the proper way to fix this is to fix it with DataContractInitializer specifics construct or style. So in my case I modify Book like this:

    class Book {
      // existing code
     
     [OnDeserializing]
     pivate void OnDeserializing( StreamingContext context ) {
        if( this.authors == null )
           this.authors = new List<PERSON>();
       // Other variable initialization here to replace those tossed out
       // courtesy of DataContractSerializer.
      }

    }

    This style unambiguously says that they are only needed to deal with DataContractSerializer idiosyncrasy and nothing else. In fact if you examine the IL code for variable initialization, they are just a bunch of object construction and assignment IL code stashed together in the front of the .ctor.

    I just wish DataContractSerializer would carry a big warning on this highly unusual departure from standard object creation technique.

    Thanks for your contribution.



    Thanks for your explanation! It is exactly what I need.
    Wednesday, August 18, 2010 1:44 AM
  • Your answer stirs my curiosity further. Do you know the .Net construct that allows one to create essentially a partially constructed object in which variable initializers are not executed? It may come in handy in the future.

    In this case, it's FormatterServices.GetUninitializedObject which can generate an object where field initializers are not run.

    Here's a sample to prove the behaviour:

    class Program
        {
            static void Main(string[] args)
            {
                var serializeTest = (SerializeTest)FormatterServices.GetUninitializedObject(typeof(SerializeTest));
                Console.WriteLine(serializeTest.IsCollectionInitialized());
                Console.ReadLine();
            }
        }
    
        public class SerializeTest
        {
            public List<int> collection = new List<int>();
    
            public bool IsCollectionInitialized()
            {
                return collection != null;
            }
        }

    Monday, December 10, 2012 10:04 AM