none
Inconsistent use of XmlNodeList with SelectedNodes() and GetElementsByTagName() RRS feed

  • Question

  • Hi,

    documentation for GetElementsByTagName(string name) located at http://msdn.microsoft.com/en-us/library/dc0c9ekk(v=vs.80).aspx says that return value of this method is:

    Return Value

    An XmlNodeList containing a list of all matching nodes.

    When I look for XmlNodeList description the documentation (http://msdn.microsoft.com/en-us/library/system.xml.xmlnodelist(v=vs.80).aspx) says this:

    The XmlNodeList collection is "live"; that is, changes to the children of the node object that it was created from are immediately reflected in the nodes returned by the XmlNodeList properties and methods. XmlNodeList supports iteration and indexed access.

    For xml method SelectedNodes (http://msdn.microsoft.com/en-us/library/hcebdtae(v=vs.80).aspx) the documentation says this:

    Return Value

    An XmlNodeList containing a collection of nodes matching the XPath query. The XmlNodeList should not be expected to be connected "live" to the XML document. That is, changes that appear in the XML document may not appear in the XmlNodeList, and vice versa.

    How XmlNodeList returned from SelectedNodes can not reflect changes in the xml document, because XmlNodeList returned by GetElementsByTagName(string name) does reflect changes. It both cases we return XmlNodeList ?

    Further more, using SelectedNodes like this:

    XmlDocument doc = new XmlDocument();

    doc.LoadXml("some valid xml string");

    XmlNodeList nodeList = doc.SelectedNodes("/root/person");

    foreach(XmlNode n in nodeList)

      n.ParentNode.RemoveChild(n);

    Does this result unexpected behaviour in nodeList maybe doc? Does this mean that we can not safely delete or update xml elements when using method SelectedNodes(). I tested upper code and it works ok. When can we expect unexpected results.




    • Edited by m.space Tuesday, October 8, 2013 2:17 PM
    Tuesday, October 8, 2013 2:13 PM

Answers

  • Hello m.space,

    I put together two tests that I believe illustrate what the documentation is trying to convey.  The test method removes the nodes using the SelectedNodes() method.

            [TestMethod]
            public void TestSelectNodesBehavior()
            {
                XmlDocument doc = new XmlDocument();
                doc.LoadXml("<root><person><id>1</id><name>j</name></person><person><id>2</id><name>j</name></person><person><id>1</id><name>j</name></person><person><id>3</id><name>j</name></person><business></business></root>");
    
                XmlNodeList nodeList = doc.SelectNodes("/root/person");
    
                Assert.AreEqual(5, doc.FirstChild.ChildNodes.Count, "There should have been a total of 5 nodes: 4 person nodes and 1 business node");
                Assert.AreEqual(4, nodeList.Count, "There should have been a total of 4 nodes");
    
                foreach (XmlNode n in nodeList)
                    n.ParentNode.RemoveChild(n);
    
                Assert.AreEqual(1, doc.FirstChild.ChildNodes.Count, "There should have been only 1 business node left in the document");
                Assert.AreEqual(4, nodeList.Count, "There should have been a total of 4 nodes");
            }

    The next test illustrates the difference by performing the same function (removing the person nodes) but by using the GetElementByTagName() method to select the nodes. Though the same object type is returned it construction is different.  The SelectNodes() is a collection of references back to the xml document.  That means we can remove from the document in a foreach without affecting the list of references.  This is shown by the count of the nodelist not being affected.  The GetElementByTagName() is a collection that directly reflects the nodes in the document. That means as we remove the items in the parent, we actually affect the collection of nodes. <em>This is why the nodelist can not be manipulated in a foreach but had to be changed to a while loop.</em>

            [TestMethod]
            public void TestGetElementsByTagNameBehavior()
            {
                XmlDocument doc = new XmlDocument();
                doc.LoadXml("<root><person><id>1</id><name>j</name></person><person><id>2</id><name>j</name></person><person><id>1</id><name>j</name></person><person><id>3</id><name>j</name></person><business></business></root>");
    
                XmlNodeList nodeList = doc.GetElementsByTagName("person");
    
                Assert.AreEqual(5, doc.FirstChild.ChildNodes.Count, "There should have been a total of 5 nodes: 4 person nodes and 1 business node");
                Assert.AreEqual(4, nodeList.Count, "There should have been a total of 4 nodes");
    
                while (nodeList.Count > 0)
                    nodeList[0].ParentNode.RemoveChild(nodeList[0]);
    
                Assert.AreEqual(1, doc.FirstChild.ChildNodes.Count, "There should have been only 1 business node left in the document");
                Assert.AreEqual(0, nodeList.Count, "All the nodes have been removed");
            }
    
    Cheers


    Jeff

    • Marked as answer by m.space Thursday, October 10, 2013 6:45 AM
    Tuesday, October 8, 2013 7:51 PM

All replies

  • Hello m.space,

    I put together two tests that I believe illustrate what the documentation is trying to convey.  The test method removes the nodes using the SelectedNodes() method.

            [TestMethod]
            public void TestSelectNodesBehavior()
            {
                XmlDocument doc = new XmlDocument();
                doc.LoadXml("<root><person><id>1</id><name>j</name></person><person><id>2</id><name>j</name></person><person><id>1</id><name>j</name></person><person><id>3</id><name>j</name></person><business></business></root>");
    
                XmlNodeList nodeList = doc.SelectNodes("/root/person");
    
                Assert.AreEqual(5, doc.FirstChild.ChildNodes.Count, "There should have been a total of 5 nodes: 4 person nodes and 1 business node");
                Assert.AreEqual(4, nodeList.Count, "There should have been a total of 4 nodes");
    
                foreach (XmlNode n in nodeList)
                    n.ParentNode.RemoveChild(n);
    
                Assert.AreEqual(1, doc.FirstChild.ChildNodes.Count, "There should have been only 1 business node left in the document");
                Assert.AreEqual(4, nodeList.Count, "There should have been a total of 4 nodes");
            }

    The next test illustrates the difference by performing the same function (removing the person nodes) but by using the GetElementByTagName() method to select the nodes. Though the same object type is returned it construction is different.  The SelectNodes() is a collection of references back to the xml document.  That means we can remove from the document in a foreach without affecting the list of references.  This is shown by the count of the nodelist not being affected.  The GetElementByTagName() is a collection that directly reflects the nodes in the document. That means as we remove the items in the parent, we actually affect the collection of nodes. <em>This is why the nodelist can not be manipulated in a foreach but had to be changed to a while loop.</em>

            [TestMethod]
            public void TestGetElementsByTagNameBehavior()
            {
                XmlDocument doc = new XmlDocument();
                doc.LoadXml("<root><person><id>1</id><name>j</name></person><person><id>2</id><name>j</name></person><person><id>1</id><name>j</name></person><person><id>3</id><name>j</name></person><business></business></root>");
    
                XmlNodeList nodeList = doc.GetElementsByTagName("person");
    
                Assert.AreEqual(5, doc.FirstChild.ChildNodes.Count, "There should have been a total of 5 nodes: 4 person nodes and 1 business node");
                Assert.AreEqual(4, nodeList.Count, "There should have been a total of 4 nodes");
    
                while (nodeList.Count > 0)
                    nodeList[0].ParentNode.RemoveChild(nodeList[0]);
    
                Assert.AreEqual(1, doc.FirstChild.ChildNodes.Count, "There should have been only 1 business node left in the document");
                Assert.AreEqual(0, nodeList.Count, "All the nodes have been removed");
            }
    
    Cheers


    Jeff

    • Marked as answer by m.space Thursday, October 10, 2013 6:45 AM
    Tuesday, October 8, 2013 7:51 PM
  • Jeff I really appreciate your answer.

    I tried both your unit tests.

    XmlNodeList nodeList = doc.SelectNodes("/root/person"); is actually {System.Xml.XPathNodeList}
    XmlNodeList nodeList = doc.GetElementsByTagName("person"); is actually {System.Xml.XmlElementList}

    Visual studio 2008 shows that, when I put brakepoints on those statements and move with my mouse to them. I looked documentation for XmlNodeList (http://msdn.microsoft.com/en-us/library/system.xml.xmlnodelist(v=vs.80).aspx) and it says: public abstract class XmlNodeList : IEnumerable. So code implemented for SelectedNodes is ofcourse different from GetElementsByTagName(). Maybe that is why one keeps list of references and the other directly reflects changes. I think GetElementsByTagName() also has list of references to xml documents nodes, but when calling for example nodeList[i].ParentNode.RemoveChild(nodeList[i]]) it frees/disposes that reference. 

    So with SelectNodes() we get collection / list of references to xml document nodes. We can manipulate with those references. If we delete node, the change will be visible to xml document, but the collection / list of references is the same (although node which was deleted, it's reference points now to null -> System.NullReferenceException).

    With GetElementByTagName() like you mentioned: is a collection that directly reflects the nodes in the document. That means as we remove the items in the parent, we actually affect the collection of nodes.

    I modified your code to confirm null reference 

    [TestMethod] public void TestGetElementsByTagNameBehavior() { XmlDocument doc = new XmlDocument(); doc.LoadXml("<root><person><id>1</id><name>j</name></person><person><id>2</id><name>j</name></person><person><id>1</id><name>j</name></person><person><id>3</id><name>j</name></person><business></business></root>"); XmlNodeList nodeList = doc.GetElementsByTagName("person"); Assert.AreEqual(5, doc.FirstChild.ChildNodes.Count, "There should have been a total of 5 nodes: 4 person nodes and 1 business node"); Assert.AreEqual(4, nodeList.Count, "There should have been a total of 4 nodes"); while (nodeList.Count > 0) nodeList[0].ParentNode.RemoveChild(nodeList[0]);

    string a = nodeList[1].InnerText; // Test method selectednodes.UnitTest1.TestGetElementsByTagNameBehavior

    // threw exception:  System.NullReferenceException: Object reference not set to an instance of an object..

    Assert.AreEqual(1, doc.FirstChild.ChildNodes.Count, "There should have been only 1 business node left in the document"); Assert.AreEqual(0, nodeList.Count, "All the nodes have been removed"); }

    Wednesday, October 9, 2013 7:16 AM
  • I also would like to add this post about how returned XmlNodeList by GetElementsByTagName() directly reflects changes to xml document. http://blogs.msdn.com/b/eriksalt/archive/2005/07/20/getelementsbytagname.aspx

    XmlDocument exposes ‘OnChange’ events.  When an item is added into the DOM, the XmlDocument.NodeInserted event fires, each NodeList can determine if the new element should be added to its list or not, and then take the appropriate action.

    The trick is to register each returned XmlNodeList for events.

     
    Thursday, October 10, 2013 6:50 AM