none
Commenting out lines in XML

    Question

  • How can I allow for "-" or "--" to be allowed inside a comment block?

     

    I have attributes in my elemnts that contain a string identifier.  That string identifier may be "G*X*-----**T" as an example.  If I want to comment out a "node", and put - `<!--` before the line that contains the above mentioned attribute, and `-->` after the line, I get an exception from System.XML.XmlReader  :

     

    An XML comment cannot contain '--', and '-' cannot be the last character

     

    This is ridiculous.  Whatever is between an <!-- and a --> should just be outright ignored.  This also says the parser is slow, and parses unecessary information.  The data between the <!-- and the --> should be outright ignored.

     

     

     

    Please understand, I'm a developer and am forced into using certain technologies, like XML.  I don't like it, but HAVE to use it, so please, bear with my frustration.  I could do things much simpler without XML, but certain people < *cough* management *cough* > get excited because they read up on something technical( which they shouldn't ) and think they have the next brilliant idea( I'm sorry for using "brilliant idea" and "manager" in the same sentence, it'll never happen again. ).

     

    Thanks in advance for any help.

    Wednesday, August 22, 2007 5:55 PM

Answers

  • You can't.  XML doesn't allow comments to include a hyphen followed by another hyphen.  The restriction is there for compatibility with SGML.  From section 2.5 of the XML 1.0 recommendation (http://www.w3.org/TR/REC-xml/#sec-comments):

     

    [Definition: Comments may appear anywhere in a document outside other markup; in addition, they may appear within the document type declaration at places allowed by the grammar. They are not part of the document's character data; an XML processor MAY, but need not, make it possible for an application to retrieve the text of comments. For compatibility, the string "--" (double-hyphen) MUST NOT occur within comments.] Parameter entity references MUST NOT be recognized within comments.

    Comments

    [15]    Comment    ::=    '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

    Wednesday, August 22, 2007 6:06 PM
  • Right, and this is what XML Notepad 2007 does for you when you save your document because it also uses XmlWriter which works around the limitations of the XML specification by inserting a space between the two dashes.  But the problem with that algorithm is that it is hard to "undo" this operation if you later want to "uncomment" that text and get back your original value.

     

    Note that the parser can't really "ignore" the text up to "-->' because if your text contained the real comment terminator: "-->" then you would have a real problem when you saved this, because when you loaded it you would get a real syntax error - so some amount of "rescanning" is required to check for these kinds of errors. 

     

    So a better algorithm for safely commenting/uncommenting blocks of XML would be to "encode" the comment delimiters in a way that's "decodable" in the case you want to later "uncomment" that block.  XML Notepad 2007 does this by encoding the start comment as /* and the end comment as as */, which can then be decoded one level at a time when you uncomment that block, so commenting "<child/>" in the following:

    <foo><bar><child/></bar></foo>

    becomes:

    <foo><bar><!--<child/>--></bar></foo>

    Then commenting the <bar> element becomes:

    <foo><!--<bar>/*<child/>*/</bar>--></foo>

    Then commenting foo you get:

    <!--<foo>/*<bar>/*<child/>*/</bar>*/</foo>-->

     

    Then uncommenting this you get one level of /* */ expanded to

    <foo><!--<bar>/*<child/>*/</bar>--></foo>

    Then uncommenting again you get:

    <foo><bar><!--<child/>--></bar></foo>

    And again you get back to your original:

    <foo><bar><child/></bar></foo>

     

    So you could use a similar algorithm to encode/decode the illegal "--" characters - if you can invent a unique escaping mechanism that you know will otherwise be unused in your comments.  A typical trick in a C-style program is to use backslash.  So you could escape this as "\-\-" and escape single backslash as "\\", then when you uncomment you unescape "\\" becomes "\" and "\-" becomes "\-" so you have not lost or corrupted any data.  XML notepad doesn't do this (yet :-).

     

    Heaven knows why the XML specification banned "--" not followed by ">".  It doesn't make any sense to me, but note that they also bans "]]>" inside your text nodes.  So you would need to encode that also if it could ever show up programatically as something you'd like to save in text nodes.

     

    -Chris.

     

     

    Thursday, August 23, 2007 8:08 AM
  • The  reason is (almost) the same as the reason why it is not possible to include "*/" within C-style /* */ comments, or a newline within C++ style // comment. Probably one of the requirements of the XML language (to be simple to implement) weighed here and the creators decided not to force implementations to look ahead more than 2 characters in order to be able to make a decision about the next token.

     

    The solution pointed out by Chris is universal in all such cases.

     

    Another solution would be that in case you control fully the structure of the xml document, you could just put anyone of your comments within a text node with parent named (let's say) "comment".

     

    As the "comment element will have no other purpose, it can be conveniently used to contain comments.

     

    Of course, in this case  you still have to escape characters such as "&" and "<".

     

    Hope this helped.

     

    Cheers,

    Dimitre Novatchev

     

    Wednesday, August 29, 2007 4:21 AM

All replies

  • FYI, I've tried creating the XmlReader like so :

     

    XmlReaderSettings settings = new XmlReaderSettings();

    settings.IgnoreComments = true;

    using( XmlReader reader = XmlReader.Create( "IHateXML.xml", settings ) )

     

    And I still get the exception on stuff inside of the commented region.  What exactly does IgnoreComments mean then?

    Wednesday, August 22, 2007 6:00 PM
  • You can't.  XML doesn't allow comments to include a hyphen followed by another hyphen.  The restriction is there for compatibility with SGML.  From section 2.5 of the XML 1.0 recommendation (http://www.w3.org/TR/REC-xml/#sec-comments):

     

    [Definition: Comments may appear anywhere in a document outside other markup; in addition, they may appear within the document type declaration at places allowed by the grammar. They are not part of the document's character data; an XML processor MAY, but need not, make it possible for an application to retrieve the text of comments. For compatibility, the string "--" (double-hyphen) MUST NOT occur within comments.] Parameter entity references MUST NOT be recognized within comments.

    Comments

    [15]    Comment    ::=    '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

    Wednesday, August 22, 2007 6:06 PM
  •  

    Hi Spintz,

     

    If i understand correctly what your looking to do.. this should take care of it for you.

     

    .WriteComment description

    Writes out a comment <!-- ... --> containing the specified text.

     

    Code Snippet

    'Create new file

    Dim txtwrite As XmlTextWriter = New XmlTextWriter("filename", Nothing)

     

    'Open the file

    txtwrite.WriteStartDocument()

     

    'Write a comment

    txtwrite.WriteComment("This is a comment")

     

     

     

    hope this helps Smile

     

    [EDIT] oh, ok.. i see what your doing now.. previous poster answered it. sorry

    Wednesday, August 22, 2007 6:21 PM
  • Right, and this is what XML Notepad 2007 does for you when you save your document because it also uses XmlWriter which works around the limitations of the XML specification by inserting a space between the two dashes.  But the problem with that algorithm is that it is hard to "undo" this operation if you later want to "uncomment" that text and get back your original value.

     

    Note that the parser can't really "ignore" the text up to "-->' because if your text contained the real comment terminator: "-->" then you would have a real problem when you saved this, because when you loaded it you would get a real syntax error - so some amount of "rescanning" is required to check for these kinds of errors. 

     

    So a better algorithm for safely commenting/uncommenting blocks of XML would be to "encode" the comment delimiters in a way that's "decodable" in the case you want to later "uncomment" that block.  XML Notepad 2007 does this by encoding the start comment as /* and the end comment as as */, which can then be decoded one level at a time when you uncomment that block, so commenting "<child/>" in the following:

    <foo><bar><child/></bar></foo>

    becomes:

    <foo><bar><!--<child/>--></bar></foo>

    Then commenting the <bar> element becomes:

    <foo><!--<bar>/*<child/>*/</bar>--></foo>

    Then commenting foo you get:

    <!--<foo>/*<bar>/*<child/>*/</bar>*/</foo>-->

     

    Then uncommenting this you get one level of /* */ expanded to

    <foo><!--<bar>/*<child/>*/</bar>--></foo>

    Then uncommenting again you get:

    <foo><bar><!--<child/>--></bar></foo>

    And again you get back to your original:

    <foo><bar><child/></bar></foo>

     

    So you could use a similar algorithm to encode/decode the illegal "--" characters - if you can invent a unique escaping mechanism that you know will otherwise be unused in your comments.  A typical trick in a C-style program is to use backslash.  So you could escape this as "\-\-" and escape single backslash as "\\", then when you uncomment you unescape "\\" becomes "\" and "\-" becomes "\-" so you have not lost or corrupted any data.  XML notepad doesn't do this (yet :-).

     

    Heaven knows why the XML specification banned "--" not followed by ">".  It doesn't make any sense to me, but note that they also bans "]]>" inside your text nodes.  So you would need to encode that also if it could ever show up programatically as something you'd like to save in text nodes.

     

    -Chris.

     

     

    Thursday, August 23, 2007 8:08 AM
  • The  reason is (almost) the same as the reason why it is not possible to include "*/" within C-style /* */ comments, or a newline within C++ style // comment. Probably one of the requirements of the XML language (to be simple to implement) weighed here and the creators decided not to force implementations to look ahead more than 2 characters in order to be able to make a decision about the next token.

     

    The solution pointed out by Chris is universal in all such cases.

     

    Another solution would be that in case you control fully the structure of the xml document, you could just put anyone of your comments within a text node with parent named (let's say) "comment".

     

    As the "comment element will have no other purpose, it can be conveniently used to contain comments.

     

    Of course, in this case  you still have to escape characters such as "&" and "<".

     

    Hope this helped.

     

    Cheers,

    Dimitre Novatchev

     

    Wednesday, August 29, 2007 4:21 AM