Microsoft Developer Network > 포럼 홈 > Visual C# General > Parsing poorly formed XML
질문하기질문하기
 

일반 토론Parsing poorly formed XML

  • 2009년 11월 4일 수요일 오후 4:59MichaelJHuman 사용자 메달사용자 메달사용자 메달사용자 메달사용자 메달
     
    Hello,

    I am trying to parse a file with poorly formed XML.  I have no control over this file, and I have to read a number of files with the same issue.  There's a termintating </font> tag with no start tag.

    This file was generated by an online game and has an XLS extension.  It opens in both Excel and HTML.

    If Excel can read it, it seems reasonable there's a way to use the .NET API to read it, but Excel could have it's own parser.

    I would say it looks more like HTML than XML, but I could not find an HTML parser in .NET.



모든 응답

  • 2009년 11월 4일 수요일 오후 5:28ScottyDoesKnow 사용자 메달사용자 메달사용자 메달사용자 메달사용자 메달
     
    My suggestion would be to read it line by line and insert a starting <font> tag or remove the </font> tag, then use XML parsing.
  • 2009년 11월 4일 수요일 오후 11:45MichaelJHuman 사용자 메달사용자 메달사용자 메달사용자 메달사용자 메달
     
    Unfortunately, that was the only idea I could come up with.  I ended up re-writing to a temp file and removing some of the bad tags.  It was a pain.

    I am still curious as to how Excel and IE could both read the doc.  I wonder what parser they use.

  • 2009년 11월 5일 목요일 오전 12:02Yort 사용자 메달사용자 메달사용자 메달사용자 메달사용자 메달
     

    Hi,

    There are at least two MS xml parsers... a COM one and the .Net one, it wouldn't surprise me if there were others or modified versions of those embedded in their own apps. I would expect the COM one would also fail to load badly formed Xml however (so won't solve your problem).

  • 2009년 11월 5일 목요일 오후 6:50Yort 사용자 메달사용자 메달사용자 메달사용자 메달사용자 메달
     
    Hi,

    It occurs to me this morning that this might be possible using an XmlReader of some kind (XmlTextReader ?) instead of an XmlDocument or XPathDocument.

    Since the readers are designed to process xml from streams, a bit at a time, they won't find the error until your parsed at least some of the document. It might also be possible to ignore the error returned by the reader and continue, or it maybe that because of the way the reader works it doesn't even notice the error unless you specifically ask it to validate the xml. I haven't used readers much, but since IE and Excel are both designed to deal with large xml files it seems like they might well use one, and that might be why they cope with the badly formed stuff.
  • 2009년 11월 11일 수요일 오전 2:13Harry ZhuMSFT, 중재자사용자 메달사용자 메달사용자 메달사용자 메달사용자 메달
     
    Hi,

    Could you please post the content of the xml file and the code you are working with?

    Harry
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
  • 2009년 11월 13일 금요일 오전 4:30Harry ZhuMSFT, 중재자사용자 메달사용자 메달사용자 메달사용자 메달사용자 메달
     

    We are changing the issue type to “General Discussion” because you have not followed up with the necessary information. If you have more time to look at the issue and provide more information, please feel free to change the issue type back to “Question” by opening the Options list at the top of the post  window, and changing the type. If the issue is resolved, we will appreciate it if you can share the solution so that the answer can be found and used by other community members having similar questions.


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.