MSDN > Home page del forum > Visual C# General > regular expression to read xml
Formula una domandaFormula una domanda
 

Con rispostaregular expression to read xml

  • mercoledì 4 novembre 2009 19.59serkan sendur Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    Hello all,
    i want to read a text file and read some xml info in it. this file is a log file it has some structured xml messages and also some plain text messages in it.
    i want to create a new xml document including only the xml messages but not plain unformatted text. the sample of the text file is like below :
    some text some text
    <msg>
    <sometags>
    </sometags>
    </msg>
    some text some text
    <msg>
    <sometags>
    </sometags>
    </msg>
    after running my code i want to have only
    <msg>
    <sometags>
    </sometags>
    </msg>
    <msg>
    <sometags>
    </sometags>
    </msg>

    how can i do that using regex?

    Thanks

Risposte

  • venerdì 27 novembre 2009 16.14serkan sendur Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     Con rispostaContiene codice
    here is my solution :

    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Drawing;
    using System.Text;
    using System.Windows.Forms;
    using System.Text.RegularExpressions;
    
    namespace TraceParser
    {
    	public partial class Form1 : Form
    	{
    		public Form1()
    		{
    			InitializeComponent();
    		}
    
    		private void button1_Click(object sender, EventArgs e)
    		{
    			string text = richTextBox1.Text;
    			string searchText = "<?xml version=\"1.0\" encoding=\"UTF-16\" standalone=\"no\" ?>";
    			int index = 0;
    			StringBuilder sb = new StringBuilder();
    			while(text.IndexOf(searchText,index) != -1)
    			{
    				int beginIndex = text.IndexOf(searchText, index);
    				int endIndex = text.IndexOf("</MSG>", beginIndex);
    				int length = endIndex - beginIndex + 6;
    				sb.Append(text.Substring(beginIndex, length));
    				index = index + length;
    			}
    			richTextBox1.Text = sb.ToString();
    		}
    	}
    }
    
    Thanks for your answers.
    • Contrassegnato come rispostaserkan sendur venerdì 27 novembre 2009 16.15
    •  

Tutte le risposte

  • mercoledì 4 novembre 2009 20.36OmegaManMVP, ModeratoreMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    Hello all,
    i want to read a text file and read some xml info in it. this file is a log file it has some structured xml messages and also some plain text

    Do you mean read a textfile then write out a new textfile or append information to the original text file? The thought is unclear...

     


    William Wegerson (www.OmegaCoder.Com )
  • martedì 17 novembre 2009 6.55adatapost Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    I think you want to parse log file using RegularExpressions and XML classes. Take a look at this link http://ondotnet.com/pub/a/dotnet/2003/06/09/parsinglogs.html 
  • martedì 17 novembre 2009 9.27Derek Smyth Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    Hi,

    You should find out who mixed plain text with XML and then you should give them a good seeing to. Thats a dreadful decision.

  • giovedì 19 novembre 2009 19.52JediJohn82 Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     Contiene codice

    Something similar to this would work for you...


                String text = @"<msg>
                            <sometags>
                            </sometags>
                            </msg>
                            some text some text
                            <msg>
                            <sometags>
                            </sometags>
                            </msg>";
    
    Regex.Replace(text, @"</msg>.*<msg>", "</msg><msg>", RegexOptions.Singleline);
    
    • Proposto come rispostaJediJohn82 venerdì 20 novembre 2009 15.02
    • Proposta come risposta annullataserkan sendur venerdì 27 novembre 2009 15.33
    • Proposta come risposta annullataRudedog2Moderatorevenerdì 20 novembre 2009 13.28
    • Proposto come rispostaJediJohn82 giovedì 19 novembre 2009 19.52
    •  
  • giovedì 19 novembre 2009 20.49Rudedog2ModeratoreMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    Hello all,
    i want to read a text file and read some xml info in it. this file is a log file it has some structured xml messages and also some plain text

    Do you mean read a textfile then write out a new textfile or append information to the original text file? The thought is unclear...

     


    William Wegerson (www.OmegaCoder.Com )



    Yup, all of the above.  That's it. 
    And the original sample needs comments.  I think I figured it out.
    It would seem that the original file has more than one root, and some misplaced text.

    <msg>
        <sometags>
        </sometags>
    </msg>
    <!--some text some text that should not be here-->
    <msg>
        <sometags>
        </sometags>
    </msg>


    The above xml has more than one <msg> root.

    <!-- running my code i want to have only -->
    <msg>
        <sometags>
        </sometags>
    </msg>
    <msg>
        <sometags>
        </sometags>
    </msg>


    The above result is not valid xml, anyway.

    You would need to add a single root node

    <?xml version="1.0" encoding="utf-8" ?>
    <root>
        <!-- running my code i want to have only -->
        <msg>
            <sometags>
            </sometags>
        </msg>
        <msg>
            <sometags>
            </sometags>
        </msg>
    </root>


    Mark the best replies as answers. "Fooling computers since 1971."
  • venerdì 20 novembre 2009 9.28Derek Smyth Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     Contiene codice
    Hi,

    The easiest way to do this is to look for the first "<msg" in the file and remove everything before it. Using regexp isn't needed and thinking in regex kind of makes the problem a bit more complex that it first appears. Sure you can still do it with regex; never said that but sometimes ye olde string processing does the job.



    using System;
    
    namespace FirstMsg
    {
        class Program
        {
            static void Main(string[] args)
            {
                string text = @"some text some text some more text
    <msg>
    <sometags>
    </sometags>
    </msg>
    some text some text 
    <msg>
    <sometags>
    </sometags>
    </msg>";
                int index = text.IndexOf("<msg");
                string xml = text.Substring(index, text.Length - index);
                Console.Out.WriteLine(xml);
            }
        }
    }
    
  • venerdì 20 novembre 2009 13.28Rudedog2ModeratoreMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     Contiene codice

    Something similar to this would work for you...


                String text = @"<msg>
                            <sometags>
                            </sometags>
                            </msg>
                            some text some text
                            <msg>
                            <sometags>
                            </sometags>
                            </msg>"
    
    
    ;
    
    Regex.Replace(text, @"</msg>.*<msg>"
    
    
    , "</msg><msg>"
    
    
    , RegexOptions.Singleline);
    


    Your code is apparently untested and cannot work in this scenario.
    It will not work because of the the mispelilng of 'msq".

    Mark the best replies as answers. "Fooling computers since 1971."
  • venerdì 20 novembre 2009 15.02JediJohn82 Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    Actually it does work...the only thing that was missing was that it removed the \r\n from the text, but that is why I said "something similar to this".



    Sorry if you feel like you need to bash people...and by the way you misspelled "misspelling".
  • venerdì 20 novembre 2009 15.10Rudedog2ModeratoreMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    Actually it does work...the only thing that was missing was that it removed the \r\n from the text, but that is why I said "something similar to this".



    Sorry if you feel like you need to bash people...and by the way you misspelled "misspelling".



    No one is bashing you or picking on you.  Your code didn't work as claimed.
    What do you expect when you mark you own reply as "Answer" .
    Marking your own replies that way is looked down upon as being a bit arrogant.
    You really should let others judge the worthiness of your reply.


    Mark the best replies as answers. "Fooling computers since 1971."
  • venerdì 20 novembre 2009 15.11JediJohn82 Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     Contiene codice
    It does work as claimed...I just ran it five times to be sure.  Just because you can't get it to run doesn't mean it doesn't work.

    Here is the entire program:

    using System;
    using System.Collections.Generic;
    using System.Text;
    using System.Text.RegularExpressions;
    
    namespace Junk
    {
        class Program
        {
    
            static void Main(string[] args)
            {
                String text = @"<msg>
                            <sometags>
                            </sometags>
                            </msg>
                            some text some text
                            <msg>
                            <sometags>
                            </sometags>
                            </msg>";
    
                Console.Write(Regex.Replace(text, @"</msg>.*<msg>", "</msg><msg>", RegexOptions.Singleline));
                Console.Read();
            }
    
    
        }
    }
    
    

  • venerdì 20 novembre 2009 15.25Rudedog2ModeratoreMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    It does work as claimed...I just ran it five times to be sure.  Just because you can't get it to run doesn't mean it doesn't work.



    JediJohn82,

    Your snippet ignores the extra text and leaves it behind.

    <msg>
    <sometags>
    </sometags>
    </msg>
    some text some tex t
    <msg>
    <sometags>
    </sometags>
    </msg>

    That is the resulting string from your snippet.
    Same problem still exists.  Extra text has not been removed or processed.
    Your code works as claimed.  I guess you meant to do that.
    Mark the best replies as answers. "Fooling computers since 1971."
  • venerdì 20 novembre 2009 15.26Wyck Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     Contiene codice
    Thought I'd give the dom solution:

    string foo =
    @"<msg>
    <sometags>
    </sometags>
    </msg>
    some text some text
    <msg>
    <sometags>
    </sometags>
    </msg>
    ";
    var doc = new XmlDocument();
    doc.PreserveWhitespace = true; // you decide
    doc.InnerXml = "<x>" + foo + "</x>";
    
    var sb = new StringBuilder();
    foreach( XmlNode n in doc.DocumentElement.ChildNodes ) {
    	if( n.NodeType == XmlNodeType.Element) {
    		sb.Append( n.OuterXml );
    	}
    }
    Console.WriteLine( sb.ToString() );
    
    

  • venerdì 20 novembre 2009 15.27JediJohn82 Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    I have no idea what you are doing to my code  to make it not work, but the text "some text some text" is removed whenever I run it.
  • venerdì 20 novembre 2009 15.27Rudedog2ModeratoreMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    I have no idea what you are doing to my code  to make it not work, but the text "some text some text" is removed whenever I run it.




                String xmlText = File.ReadAllText("TextFile1.txt");
                Regex.Replace(xmlText, @"</msg>.*<msg>", "</msg><msg>", RegexOptions.Singleline);
                this.richTextBox1.Text = xmlText;


    Here's the contents of TextFile1.txt.

    <msg>
    <sometags>
    </sometags>
    </msg>
    some text some text
    <msg>
    <sometags>
    </sometags>
    </msg>

    Maybe, I did make a mistake somewhere.

    Mark the best replies as answers. "Fooling computers since 1971."
  • venerdì 20 novembre 2009 15.31JediJohn82 Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     Contiene codice
    Thank you for sharing your code.  I see the mistake.  The new value was never placed into the xmlText variable.

    Code should be:
    String xmlText = File.ReadAllText("TextFile1.txt");
                xmlText = Regex.Replace(xmlText, @"</msg>.*<msg>", "</msg><msg>", RegexOptions.Singleline);
                this.richTextBox1.Text = xmlText;

  • venerdì 20 novembre 2009 15.34Rudedog2ModeratoreMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    Yeah, I found it too.


                String xmlText = File.ReadAllText("TextFile1.txt");
                string result = Regex.Replace(xmlText, @"</msg>.*<msg>", "</msg><msg>", RegexOptions.Singleline);
                this.richTextBox1.Text = result;


    It would be nice to have the CR-LF characters in there, as well as a surrounding <root> node to make it valid XML.
    It's my job to scrutinize replies when the poster marks their own reply as "Answer" by the way.


    Mark the best replies as answers. "Fooling computers since 1971."
  • venerdì 20 novembre 2009 19.49Wyck Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    Regex.Replace(xmlText, @"</msg>.*<msg>", "</msg><msg>", RegexOptions.Singleline);
    You realize that this doesn't remove text that comes before or after the msg tags.

    ONE <msg></msg> TWO <msg></msg> THREE

    Becomes this:

    ONE <msg></msg><msg></msg> THREE


  • sabato 21 novembre 2009 21.51Derek Smyth Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    Hi JediJohn,

    Can you please not mark your own post as an answer, it's really not your place to decide. It's a bit arrogant.
  • venerdì 27 novembre 2009 15.35serkan sendur Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    It works for single message but there are many messages in that document.
  • venerdì 27 novembre 2009 16.14serkan sendur Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     Con rispostaContiene codice
    here is my solution :

    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Drawing;
    using System.Text;
    using System.Windows.Forms;
    using System.Text.RegularExpressions;
    
    namespace TraceParser
    {
    	public partial class Form1 : Form
    	{
    		public Form1()
    		{
    			InitializeComponent();
    		}
    
    		private void button1_Click(object sender, EventArgs e)
    		{
    			string text = richTextBox1.Text;
    			string searchText = "<?xml version=\"1.0\" encoding=\"UTF-16\" standalone=\"no\" ?>";
    			int index = 0;
    			StringBuilder sb = new StringBuilder();
    			while(text.IndexOf(searchText,index) != -1)
    			{
    				int beginIndex = text.IndexOf(searchText, index);
    				int endIndex = text.IndexOf("</MSG>", beginIndex);
    				int length = endIndex - beginIndex + 6;
    				sb.Append(text.Substring(beginIndex, length));
    				index = index + length;
    			}
    			richTextBox1.Text = sb.ToString();
    		}
    	}
    }
    
    Thanks for your answers.
    • Contrassegnato come rispostaserkan sendur venerdì 27 novembre 2009 16.15
    •