none
XMLReader.ReadInnerXML() throws invalid character/Hexa decimal errors RRS feed

  • Question

  • Hello Every one,

    I have a file which contains below text (Some times this file size may grow up to 200 MB)

    <Test>.... </Test> 0x00 (Hexa decimal character)

    <Test>.... </Test>

    <Test>.... </Test>

    Below code works fine when there is no Invalid characters and when there is a invalid character/Hexa decimal value it throws an exception at reader.ReadInnerXml(); saying " '.', hexadecimal value 0x00, is an invalid character. Line 1, position 658."

    Can some one please help me in how i can remove special character  or avoid exception or ignore special character.

        try
                {
                    string path = @"C:\Users\xxxxx\Desktop\Test.log";                
                    var tmpFileStream = new FileStream(path, FileMode.Open, FileAccess.Read,
                        FileShare.ReadWrite);
                    //var memoryStream = new MemoryStream();
                    var xmlDoc = new XmlDocument();
                    var settings = new XmlReaderSettings { CheckCharacters = false, IgnoreWhitespace = true };
                    using (var reader = XmlReader.Create(new StreamReader(tmpFileStream), settings))
                    {
                        
                        //tmpFileStream = null;
                        while (reader.Read())
                        {
                            if (reader.NodeType != XmlNodeType.Element || reader.Name != "Test") continue;
                           
                            var innerXml = reader.ReadInnerXml();
                        }
                    }
                }
                catch (Exception ex)
                {
    
                }

    Monday, June 29, 2020 12:03 PM

All replies

  • The given 0x00 simply means, that your file is not a valid XML document.

    So you need to rewrite it without the 0x00.

    E.g. by loading it into a string/byte buffer and removing them before feeding it into XmlDocument.

    Monday, June 29, 2020 12:27 PM
  • Do you mean to read entire file once and replace? if that is the case if the file size is big then we couldn't read the entire content to a string as the string size is limited. (Can you please help me with a sample)
    • Edited by Avatar 123 Monday, June 29, 2020 12:32 PM
    Monday, June 29, 2020 12:30 PM
  • Please elaborate your context..

    In the first place, the producer of that file should be contacted. Cause it is an invalid XML file.

    If the files are to big, then you need to rewrite the file.

    The only other (transparent) approach, would be implementing your own StreamReader, which skips these characters.

    • Marked as answer by Avatar 123 Monday, June 29, 2020 12:41 PM
    • Unmarked as answer by Avatar 123 Monday, June 29, 2020 12:41 PM
    Monday, June 29, 2020 12:36 PM
  • Is there anyway to replace the invalid characters @ runtime in reader.ReadInnerXml()?
    Monday, June 29, 2020 12:43 PM
  • Yup, as I wrote, by using your own StreamReader.
    Monday, June 29, 2020 1:59 PM
  • hi

    Before read check null .Instated of innerXml use ReadString

    while (reader.Read()) {
      if (reader.IsStartElement()) {
        if (reader.IsEmptyElement)
                    {
                        Console.WriteLine("<{0}/>", reader.Name);
                    }
                    else {
          Console.Write("<{0}> ", reader.Name);
          reader.Read(); // Read the start tag.
          if (reader.IsStartElement())  // Handle nested elements.
            Console.Write("\r\n<{0}>", reader.Name);
          Console.WriteLine(reader.ReadString());  //Read the text content of the element.
        }
      }
    }

    Wednesday, July 1, 2020 3:31 PM