none
XmlSchema removing duplicate types

    Question

  • I am working on a code that is trying to read in a bunch of xsd files and compiling schemas in a XmlSchemaSet.

    Problem is that these xsd files come from various sources, and they might have elements/types declared multiple times, which i should remove or else i the compile method of XmlSchemaSet would throw an error.

    Is there a recommended way of doing this type of thing ?

    Friday, June 10, 2011 9:46 PM

Answers

  • Are you only interested in top level/global elements, attributes and types? You could use two XmlSchemaSets and then compare GlobalElements, GlobalTypes, GlobalAttributes.

    Here is a simple example (meant to show you an approach, not meant as complete and tested code) that compares global elements in two schemas and removes any duplicate element (based on QName comparison) in the second schema:

      class Program
      {
        static void Main(string[] args)
        {
          XmlSchemaSet schemaSet = MergeSchemas(@"..\..\XMLSchema1.xsd", @"..\..\XMLSchema2.xsd");
          foreach (XmlSchema schema in schemaSet.Schemas())
          {
            schema.Write(Console.Out);
            Console.WriteLine();
          }
        }
    
        public static XmlSchemaSet MergeSchemas(string schema1, string schema2)
        {
          XmlSchemaSet schemaSet1 = new XmlSchemaSet();
          schemaSet1.Add(null, schema1);
          schemaSet1.Compile();
    
          XmlSchemaSet schemaSet2 = new XmlSchemaSet();
          schemaSet2.Add(null, schema2);
          schemaSet2.Compile();
    
          foreach (XmlSchemaElement el1 in schemaSet1.GlobalElements.Values)
          {
            foreach (XmlSchemaElement el2 in schemaSet2.GlobalElements.Values)
            {
              if (el2.QualifiedName.Equals(el1.QualifiedName))
              {
                ((XmlSchema)el2.Parent).Items.Remove(el2);
                break;
              }
            }
          }
          foreach (XmlSchema schema in schemaSet2.Schemas())
          {
            schemaSet2.Reprocess(schema);
          }
          schemaSet2.Compile();
          schemaSet1.Add(schemaSet2);
    
          return schemaSet1;
        }
      }
    


    When I try that with VS 2010/.NET 4.0 SP 1 with the following two samples schemas

    <?xml version="1.0" encoding="utf-8"?>
    <xs:schema
      targetNamespace="http://example.com/ns1"
      elementFormDefault="qualified"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
    >
     <xs:element name="root1">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="foo"/>
       </xs:sequence>
      </xs:complexType>
     </xs:element>
    
     <xs:element name="root2">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="bar"/>
       </xs:sequence>
      </xs:complexType>
     </xs:element>
    </xs:schema>
    

    <?xml version="1.0" encoding="utf-8"?>
    <xs:schema
      targetNamespace="http://example.com/ns1"
      elementFormDefault="qualified"
      xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
     <xs:element name="root2">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="bar"/>
       </xs:sequence>
      </xs:complexType>
     </xs:element>
    
     <xs:element name="root3">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="baz"/>
       </xs:sequence>
      </xs:complexType>
     </xs:element>
    </xs:schema>
    


    the output to the console is a follows:

    <?xml version="1.0" encoding="ibm850"?>
    <xs:schema elementFormDefault="qualified" targetNamespace="http://example.com/ns1" xmlns:xs="http://www.w3.org/2001/XMLSchema">
     <xs:element name="root1">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="foo" />
       </xs:sequence>
      </xs:complexType>
     </xs:element>
     <xs:element name="root2">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="bar" />
       </xs:sequence>
      </xs:complexType>
     </xs:element>
    </xs:schema>
    <?xml version="1.0" encoding="ibm850"?>
    <xs:schema elementFormDefault="qualified" targetNamespace="http://example.com/ns1" xmlns:xs="http://www.w3.org/2001/XMLSchema">
     <xs:element name="root3">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="baz" />
       </xs:sequence>
      </xs:complexType>
     </xs:element>
    </xs:schema>
    


    So the duplicated element has been removed from the second schema. That approach could be extended to remove duplicate global type and attribute definitions although I am not sure what you want to do if types are duplicated in two schemas but also referenced in both schemas, so deleting a type could yield to an incomplete schema.

    Does that help?

     

     

     

     

     

     


    MVP Data Platform Development My blog
    Monday, June 13, 2011 4:39 PM

All replies

  • Hi seeker_123,

    Welcome!

    According to your description, I think can read all the XmlSchema to find the same elements then move the duplicates.

    http://www.codeproject.com/KB/linq/LINQ_to_XSD.aspx

    Have a nice day.


    Alan Chen[MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Monday, June 13, 2011 9:36 AM
    Moderator
  • may be the description of the problem is misleading, but what i am looking for is modifying the xsd files in memory so that i can add them to a XmlSchemaSet  by removing the types that are declared multiple times in separate files.

     

     

     

    Monday, June 13, 2011 2:04 PM
  • Are you only interested in top level/global elements, attributes and types? You could use two XmlSchemaSets and then compare GlobalElements, GlobalTypes, GlobalAttributes.

    Here is a simple example (meant to show you an approach, not meant as complete and tested code) that compares global elements in two schemas and removes any duplicate element (based on QName comparison) in the second schema:

      class Program
      {
        static void Main(string[] args)
        {
          XmlSchemaSet schemaSet = MergeSchemas(@"..\..\XMLSchema1.xsd", @"..\..\XMLSchema2.xsd");
          foreach (XmlSchema schema in schemaSet.Schemas())
          {
            schema.Write(Console.Out);
            Console.WriteLine();
          }
        }
    
        public static XmlSchemaSet MergeSchemas(string schema1, string schema2)
        {
          XmlSchemaSet schemaSet1 = new XmlSchemaSet();
          schemaSet1.Add(null, schema1);
          schemaSet1.Compile();
    
          XmlSchemaSet schemaSet2 = new XmlSchemaSet();
          schemaSet2.Add(null, schema2);
          schemaSet2.Compile();
    
          foreach (XmlSchemaElement el1 in schemaSet1.GlobalElements.Values)
          {
            foreach (XmlSchemaElement el2 in schemaSet2.GlobalElements.Values)
            {
              if (el2.QualifiedName.Equals(el1.QualifiedName))
              {
                ((XmlSchema)el2.Parent).Items.Remove(el2);
                break;
              }
            }
          }
          foreach (XmlSchema schema in schemaSet2.Schemas())
          {
            schemaSet2.Reprocess(schema);
          }
          schemaSet2.Compile();
          schemaSet1.Add(schemaSet2);
    
          return schemaSet1;
        }
      }
    


    When I try that with VS 2010/.NET 4.0 SP 1 with the following two samples schemas

    <?xml version="1.0" encoding="utf-8"?>
    <xs:schema
      targetNamespace="http://example.com/ns1"
      elementFormDefault="qualified"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
    >
     <xs:element name="root1">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="foo"/>
       </xs:sequence>
      </xs:complexType>
     </xs:element>
    
     <xs:element name="root2">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="bar"/>
       </xs:sequence>
      </xs:complexType>
     </xs:element>
    </xs:schema>
    

    <?xml version="1.0" encoding="utf-8"?>
    <xs:schema
      targetNamespace="http://example.com/ns1"
      elementFormDefault="qualified"
      xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
     <xs:element name="root2">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="bar"/>
       </xs:sequence>
      </xs:complexType>
     </xs:element>
    
     <xs:element name="root3">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="baz"/>
       </xs:sequence>
      </xs:complexType>
     </xs:element>
    </xs:schema>
    


    the output to the console is a follows:

    <?xml version="1.0" encoding="ibm850"?>
    <xs:schema elementFormDefault="qualified" targetNamespace="http://example.com/ns1" xmlns:xs="http://www.w3.org/2001/XMLSchema">
     <xs:element name="root1">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="foo" />
       </xs:sequence>
      </xs:complexType>
     </xs:element>
     <xs:element name="root2">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="bar" />
       </xs:sequence>
      </xs:complexType>
     </xs:element>
    </xs:schema>
    <?xml version="1.0" encoding="ibm850"?>
    <xs:schema elementFormDefault="qualified" targetNamespace="http://example.com/ns1" xmlns:xs="http://www.w3.org/2001/XMLSchema">
     <xs:element name="root3">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="baz" />
       </xs:sequence>
      </xs:complexType>
     </xs:element>
    </xs:schema>
    


    So the duplicated element has been removed from the second schema. That approach could be extended to remove duplicate global type and attribute definitions although I am not sure what you want to do if types are duplicated in two schemas but also referenced in both schemas, so deleting a type could yield to an incomplete schema.

    Does that help?

     

     

     

     

     

     


    MVP Data Platform Development My blog
    Monday, June 13, 2011 4:39 PM