Le réseau pour les développeurs >
Forums - Accueil
>
Visual C# General
>
regular expression to read xml
regular expression to read xml
- Hello all,
i want to read a text file and read some xml info in it. this file is a log file it has some structured xml messages and also some plain text messages in it.
i want to create a new xml document including only the xml messages but not plain unformatted text. the sample of the text file is like below :
some text some text
<msg>
<sometags>
</sometags>
</msg>
some text some text
<msg>
<sometags>
</sometags>
</msg>
after running my code i want to have only
<msg>
<sometags>
</sometags>
</msg>
<msg>
<sometags>
</sometags>
</msg>
how can i do that using regex?
Thanks
Réponses
- here is my solution :
Thanks for your answers.using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Text; using System.Windows.Forms; using System.Text.RegularExpressions; namespace TraceParser { public partial class Form1 : Form { public Form1() { InitializeComponent(); } private void button1_Click(object sender, EventArgs e) { string text = richTextBox1.Text; string searchText = "<?xml version=\"1.0\" encoding=\"UTF-16\" standalone=\"no\" ?>"; int index = 0; StringBuilder sb = new StringBuilder(); while(text.IndexOf(searchText,index) != -1) { int beginIndex = text.IndexOf(searchText, index); int endIndex = text.IndexOf("</MSG>", beginIndex); int length = endIndex - beginIndex + 6; sb.Append(text.Substring(beginIndex, length)); index = index + length; } richTextBox1.Text = sb.ToString(); } } }
- Marqué comme réponseserkan sendur vendredi 27 novembre 2009 16:15
Toutes les réponses
Hello all,
i want to read a text file and read some xml info in it. this file is a log file it has some structured xml messages and also some plain textDo you mean read a textfile then write out a new textfile or append information to the original text file? The thought is unclear...
William Wegerson (www.OmegaCoder.Com )- I think you want to parse log file using RegularExpressions and XML classes. Take a look at this link http://ondotnet.com/pub/a/dotnet/2003/06/09/parsinglogs.html
- Hi,
You should find out who mixed plain text with XML and then you should give them a good seeing to. Thats a dreadful decision.
Something similar to this would work for you...
String text = @"<msg> <sometags> </sometags> </msg> some text some text <msg> <sometags> </sometags> </msg>"; Regex.Replace(text, @"</msg>.*<msg>", "</msg><msg>", RegexOptions.Singleline);- Proposé comme réponseJediJohn82 vendredi 20 novembre 2009 15:02
- Non proposé comme réponseserkan sendur vendredi 27 novembre 2009 15:33
- Non proposé comme réponseRudedog2Moderatorvendredi 20 novembre 2009 13:28
- Proposé comme réponseJediJohn82 jeudi 19 novembre 2009 19:52
Hello all,
i want to read a text file and read some xml info in it. this file is a log file it has some structured xml messages and also some plain textDo you mean read a textfile then write out a new textfile or append information to the original text file? The thought is unclear...
William Wegerson (www.OmegaCoder.Com )
Yup, all of the above. That's it.
And the original sample needs comments. I think I figured it out.
It would seem that the original file has more than one root, and some misplaced text.
<msg>
<sometags>
</sometags>
</msg>
<!--some text some text that should not be here-->
<msg>
<sometags>
</sometags>
</msg>
The above xml has more than one <msg> root.
<!-- running my code i want to have only -->
<msg>
<sometags>
</sometags>
</msg>
<msg>
<sometags>
</sometags>
</msg>
The above result is not valid xml, anyway.
You would need to add a single root node
<?xml version="1.0" encoding="utf-8" ?>
<root>
<!-- running my code i want to have only -->
<msg>
<sometags>
</sometags>
</msg>
<msg>
<sometags>
</sometags>
</msg>
</root>
Mark the best replies as answers. "Fooling computers since 1971."- Hi,
The easiest way to do this is to look for the first "<msg" in the file and remove everything before it. Using regexp isn't needed and thinking in regex kind of makes the problem a bit more complex that it first appears. Sure you can still do it with regex; never said that but sometimes ye olde string processing does the job.
using System; namespace FirstMsg { class Program { static void Main(string[] args) { string text = @"some text some text some more text <msg> <sometags> </sometags> </msg> some text some text <msg> <sometags> </sometags> </msg>"; int index = text.IndexOf("<msg"); string xml = text.Substring(index, text.Length - index); Console.Out.WriteLine(xml); } } }
Something similar to this would work for you...
String text = @"<msg> <sometags> </sometags> </msg> some text some text <msg> <sometags> </sometags> </msg>" ; Regex.Replace(text, @"</msg>.*<msg>" , "</msg><msg>" , RegexOptions.Singleline);
Your code is apparently untested and cannot work in this scenario.
It will not work because of the the mispelilng of 'msq".
Mark the best replies as answers. "Fooling computers since 1971."- ModifiéRudedog2Moderatorvendredi 20 novembre 2009 15:13
- Actually it does work...the only thing that was missing was that it removed the \r\n from the text, but that is why I said "something similar to this".
Sorry if you feel like you need to bash people...and by the way you misspelled "misspelling". Actually it does work...the only thing that was missing was that it removed the \r\n from the text, but that is why I said "something similar to this".
Sorry if you feel like you need to bash people...and by the way you misspelled "misspelling".
No one is bashing you or picking on you. Your code didn't work as claimed.
What do you expect when you mark you own reply as "Answer" .
Marking your own replies that way is looked down upon as being a bit arrogant.
You really should let others judge the worthiness of your reply.
Mark the best replies as answers. "Fooling computers since 1971."- It does work as claimed...I just ran it five times to be sure. Just because you can't get it to run doesn't mean it doesn't work.
Here is the entire program:
using System; using System.Collections.Generic; using System.Text; using System.Text.RegularExpressions; namespace Junk { class Program { static void Main(string[] args) { String text = @"<msg> <sometags> </sometags> </msg> some text some text <msg> <sometags> </sometags> </msg>"; Console.Write(Regex.Replace(text, @"</msg>.*<msg>", "</msg><msg>", RegexOptions.Singleline)); Console.Read(); } } }
It does work as claimed...I just ran it five times to be sure. Just because you can't get it to run doesn't mean it doesn't work.
JediJohn82,
Your snippet ignores the extra text and leaves it behind.
<msg>
<sometags>
</sometags>
</msg>
some text some tex t
<msg>
<sometags>
</sometags>
</msg>
That is the resulting string from your snippet.
Same problem still exists. Extra text has not been removed or processed.
Your code works as claimed. I guess you meant to do that.
Mark the best replies as answers. "Fooling computers since 1971."- Thought I'd give the dom solution:
string foo = @"<msg> <sometags> </sometags> </msg> some text some text <msg> <sometags> </sometags> </msg> "; var doc = new XmlDocument(); doc.PreserveWhitespace = true; // you decide doc.InnerXml = "<x>" + foo + "</x>"; var sb = new StringBuilder(); foreach( XmlNode n in doc.DocumentElement.ChildNodes ) { if( n.NodeType == XmlNodeType.Element) { sb.Append( n.OuterXml ); } } Console.WriteLine( sb.ToString() );
- I have no idea what you are doing to my code to make it not work, but the text "some text some text" is removed whenever I run it.
I have no idea what you are doing to my code to make it not work, but the text "some text some text" is removed whenever I run it.
String xmlText = File.ReadAllText("TextFile1.txt");
Regex.Replace(xmlText, @"</msg>.*<msg>", "</msg><msg>", RegexOptions.Singleline);
this.richTextBox1.Text = xmlText;
Here's the contents of TextFile1.txt.
<msg>
<sometags>
</sometags>
</msg>
some text some text
<msg>
<sometags>
</sometags>
</msg>
Maybe, I did make a mistake somewhere.
Mark the best replies as answers. "Fooling computers since 1971."- ModifiéRudedog2Moderatorvendredi 20 novembre 2009 15:31
- Thank you for sharing your code. I see the mistake. The new value was never placed into the xmlText variable.
Code should be:
String xmlText = File.ReadAllText("TextFile1.txt"); xmlText = Regex.Replace(xmlText, @"</msg>.*<msg>", "</msg><msg>", RegexOptions.Singleline); this.richTextBox1.Text = xmlText; - Yeah, I found it too.
String xmlText = File.ReadAllText("TextFile1.txt");
string result = Regex.Replace(xmlText, @"</msg>.*<msg>", "</msg><msg>", RegexOptions.Singleline);
this.richTextBox1.Text = result;
It would be nice to have the CR-LF characters in there, as well as a surrounding <root> node to make it valid XML.
It's my job to scrutinize replies when the poster marks their own reply as "Answer" by the way.
Mark the best replies as answers. "Fooling computers since 1971." Regex.Replace(xmlText, @"</msg>.*<msg>", "</msg><msg>", RegexOptions.Singleline);
You realize that this doesn't remove text that comes before or after the msg tags.
ONE <msg></msg> TWO <msg></msg> THREE
Becomes this:
ONE <msg></msg><msg></msg> THREE
- Hi JediJohn,
Can you please not mark your own post as an answer, it's really not your place to decide. It's a bit arrogant. - It works for single message but there are many messages in that document.
- here is my solution :
Thanks for your answers.using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Text; using System.Windows.Forms; using System.Text.RegularExpressions; namespace TraceParser { public partial class Form1 : Form { public Form1() { InitializeComponent(); } private void button1_Click(object sender, EventArgs e) { string text = richTextBox1.Text; string searchText = "<?xml version=\"1.0\" encoding=\"UTF-16\" standalone=\"no\" ?>"; int index = 0; StringBuilder sb = new StringBuilder(); while(text.IndexOf(searchText,index) != -1) { int beginIndex = text.IndexOf(searchText, index); int endIndex = text.IndexOf("</MSG>", beginIndex); int length = endIndex - beginIndex + 6; sb.Append(text.Substring(beginIndex, length)); index = index + length; } richTextBox1.Text = sb.ToString(); } } }
- Marqué comme réponseserkan sendur vendredi 27 novembre 2009 16:15

