none
XML special characters inside attribute value causes error RRS feed

  • Question

  • I am currently importing large number of XML files which were generated by VB6 application using MSXML. I read these files into stream for processing.
    In some XMLs the Attribute's value contains XML predefines special characters such as:
    & < > " '
    This causes the application to throw error for example such attribute produces error:
    lastname = " Jon "Jarvis" 

     using (XmlReader reader = XmlReader.Create(new StreamReader(ms, Encoding.GetEncoding("windows-1252"))))
                   {
                       XmlDoc.Load(reader); <-- error is thrown
                   }

    Any direction is appreciated it.

    Wednesday, May 2, 2012 11:11 PM

All replies

  • Please post a minimal but complete sample of the markup causing the error and tell us the exact error message.

    And why are you not letting the XML parser detect the encoding by simply passing your stream to the Load method? That way the parser looks at the byte order mark and/or XML declaration and detects any declared encoding.


    MVP Data Platform Development My blog

    Thursday, May 3, 2012 9:26 AM
  • Martin;

    This is samle of XML node that causes the error :

    <rs:data>
    <z:row GUID="{442FC2EF-A73F-40E4-87D7-077777777777}" ENROLL_DATE="2010-11-12T07:08:50" LNAME="Van "Cliff" Anderson " LNAME_SOUNDEX="." FNAME="John Van Anderson" FNAME_SOUNDEX="A1345" MNAME="" MNAME_SOUNDEX="" GENDER="MALE" DOB="1980-01-01T00:00:00" NATIONALITY="Dutch"/>
    </rs:data>

    The attributes LNAME throws the error

    'Cliff' is an unexpected token. Expecting white space.

    Or if LNAME would contain '<' character I get

    '<', hexadecimal value 0x3C, is an invalid attribute character.

    The reason I do not  pass the stream into load is due to the structure of MSXML structure I get the error:

    "Data at the root level is invalid"

    Using XMLREADER

    and Encoding I have resolve the above error.

    Thursday, May 3, 2012 4:46 PM
  • With well-formed XML, if the attribute value is delimited by double quotes any double quote in the value needs to be escaped as e.g.

    <Foo LNAME="Van &quot;Cliff&quot; Anderson"/>
     

    On the other hand I don't see why MSXML would create malformed markup like the one you post as the input you have.

    Can you elaborate how you use MSXML and .NET together? Then perhaps we can find a way to avoid the problem.

    MVP Data Platform Development My blog

    Thursday, May 3, 2012 5:35 PM
  • Martin;

    The XML files were generated by VB6 application by populating an MSXML XML with recordset data.

    The .Net application is seperate application which will consume this XMLs and parse them and place the data into SQL server database.

    The .Net applciation reades the XML file into MemoryStream and then attempts to parse it.  I could have used MSXML.DOMDocument to load the MSXML XML file inside the .Net application and I would not have any problem, however MSXML.DOMDocument Load event is not compatible with System MemoryStream.

    Monday, May 7, 2012 7:57 PM
  • What happens if you load one of the VB6 generated XML documents into a web browser window of a web browser like Firefox or IE? Do they give a parse error too?

    It looks to me as if your VB6 generates something which is not well-formed XML as you say loading directly into into an XDocument (or are you using an XmlDocument?) with e.g. XDocument.Load("file.xml") gives you a parse error. And your attempts to work around the error reported there by first feeding the markup into a MemoryStream somehow introduce other errors.

    On the other hand I can't imagine that MSXML is not escaping characters as needed when saving XML documents to a file so I still wonder where the malformed markup comes from.


    MVP Data Platform Development My blog

    Tuesday, May 8, 2012 9:45 AM