none
Counting number of occurrences of different strings in a file and use it? RRS feed

  • Question

  • I'm trying to write a program which will count the number of occurrences of different strings and put the count values inside certain nodes in the file.
    Basically it searches for strings </fig>, </disp-formula>, </table-wrap> in the file and pasted the count value of each them in the nodes <fig-count count="..."/>, <table-count count="..."/>, <equation-count count="..."/>.

    I've done

    var basePath=textBox1.Text;
    			List<string[]> xmlFolders = Directory.GetDirectories(basePath, "xml", SearchOption.AllDirectories)
    				.Select(item => Directory.GetFiles(item, "*.xml")).ToList();
    			
    			
    			string[,] definitions=new string[,]
    			{{"</disp-formula>", "equation-count"},
    				{"</fig>", "fig-count"},
    				{"</table-wrap>", "table-count"}};
    			var definitionsLength = definitions.Length;
    			for (var i = 0; i < definitionsLength; i++) {
    				var searchString = definitions[i][0];
    				var resultTag = definitions[i][1];
    				var count = 0;
    				
    				foreach (var xml in xmlFolders) {
    					if (Regex.IsMatch(xml, searchString, RegexOptions.IgnoreCase))
    						count++;
    					File.WriteAllText(xml, File.ReadAllText(xml).Replace("<"+resultTag + " count=\"[0-9]+\"/>", "<" + resultTag + " count=\"" + count +"/>\""));
    				}
    				
    			}

    But getting tons of errors in lines containing the following codes 

    var searchString = definitions[i][0]; var resultTag = definitions[i][1]; 

    Wrong number of indices inside []; expected 2

    and in the regex find and replace portion the below errors

    The best overloaded method match for 'System.Text.RegularExpressions.Regex.IsMatch(string, string, System.Text.RegularExpressions.RegexOptions)' has some invalid arguments

    Argument 1: cannot convert from 'string[]' to 'string'

    The best overloaded method match for 'System.IO.File.ReadAllText(string)' has some invalid arguments
    The best overloaded method match for 'System.IO.File.WriteAllText(string, string)' has some invalid arguments

    How do I correct this?

    Sunday, October 15, 2017 1:50 PM

Answers

  • Try this simple approach

    var path=@"D:\C#\SampleXML\test.XML";
    			string text = File.ReadAllText(path);
    			int fig_count = Regex.Matches(text, @"fig id=""F").Count;
    			int tab_count = Regex.Matches(text, @"table-wrap id=""T").Count;
    			int eq_count = Regex.Matches(text, @"disp-formula id=""deqn").Count;
    			File.WriteAllText(path,Regex.Replace(File.ReadAllText(path), @"<fig-count count=""\d+""/>",@"<fig-count count="""+fig_count+@"""/>"));
    			File.WriteAllText(path,Regex.Replace(File.ReadAllText(path), @"<table-count count=""\d+""/>",@"<table-count count="""+tab_count+@"""/>"));
    			File.WriteAllText(path,Regex.Replace(File.ReadAllText(path), @"<eq-count count=""\d+""/>",@"<eq-count count="""+eq_count+@"""/>"));

    • Marked as answer by Don Bradman Sunday, October 22, 2017 3:54 PM
    Sunday, October 22, 2017 3:51 PM

All replies

  • In order to fix the first error, try ‘definitions[i,0]’ instead of ‘definitions[i][0]’

    Check the next example too:

    var searchString = "</disp-formula>";
    var resultTag = "equation-count";
     
    string text = File.ReadAllText( xml );
    var count = Regex.Matches( text, Regex.Escape( searchString ), RegexOptions.IgnoreCase ).Count;
    text = Regex.Replace( text, "<" + resultTag + " count=\"[0-9]+\"/>", "<" + resultTag + " count=\"" + count + "\"/>" );
    File.WriteAllText( xml, text );

     

    Adjust it for your loops.

    If the file is an XML, then maybe consider special classes: XDocument or XmlDocument. And also avoid reading and writing the same file multiple times.


    • Edited by Viorel_MVP Sunday, October 15, 2017 6:55 PM
    Sunday, October 15, 2017 6:53 PM


  • 			
    			
    			string[,] definitions=new string[,]
    			{{"</disp-formula>", "equation-count"},
    				{"</fig>", "fig-count"},
    				{"</table-wrap>", "table-count"}};
    			var definitionsLength = definitions.Length;
    			for (var i = 0; i < definitionsLength; i++) {
    				var searchString = definitions[i][0];
    				var resultTag = definitions[i][1];
    
    				
    		

    But getting tons of errors in lines containing the following codes 

    var searchString = definitions[i][0]; var resultTag = definitions[i][1]; 

    Wrong number of indices inside []; expected 2


    You are trying to use the syntax for jagged arrays with multidimensional arrays. See:

    Jagged Arrays (C# Programming Guide)
    https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/arrays/jagged-arrays

    Multidimensional Arrays (C# Programming Guide)
    https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/arrays/multidimensional-arrays

    - Wayne

    Sunday, October 15, 2017 8:19 PM
  • If the file is an XML, then maybe consider special classes:XDocument or XmlDocument. And also avoid reading and writing the same file multiple times.


    Hi Viorel,

    How do I use XDocument for this task?

    BTW, the write way to count the nodes for my files will be counting the number of <table-wrap id="table...">, <fig id="fig...">, <disp-formula id="deqn..."> where ... could be anumber/numbers, letters/combination of letter and number and can even have hyphen andparenthesis.

    So if I try something like

    int count=xml.Descendants("table-wrap").Attributes("id").Count();

    It will given me the count of all nodes that have the name table-wrap and have attribute called id, so if there are nodes like <table-wrap id="array-table..."> that will also be counted, but I don't want that. How do I solve that?

    Monday, October 16, 2017 10:23 AM
  • Also when I save the file the formatting of the file changes

    Before running the program

    After running the program

    I want the previous formatting i.e. without the indentations and without changing the value of hex character like &#x0394;, &#x002A; to Δ, *. If I add 

    SaveOptions.DisableFormatting


    Then the entire file becomes a single line...

    How do I solve this issue, is there any option the keep the previous formatting after save?

    Monday, October 16, 2017 10:37 AM
  • Hello Don,

    >>How do I use XDocument for this task?

    Linq To Xml is a good choice for your situaion,which provided a customer query base on the xml format.This is just a example for you.

     var result = from t in xd.Descendants("table-wrap")
    
                             where t.Attribute("id").Value.Contains("mobile")//add you customer where conditions ,it’sjust a example
    
                             select t;
    
                int count = result.Count();
    

    >>How do I solve this issue, is there any option the keep the previous formatting after save?

    I didn't find more details about saving the file.The latter picture is a default format of xml file. The XmlDocument.PreserveWhitespace is a good choice for situation.If PreserveWhitespace is false before Save is called, XmlDocument auto-indents the output.

    If you have any issues with my reply please feel free to contact me.

    Sincerely,

    Neil Hu


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, October 20, 2017 8:55 AM
    Moderator
  • Hi Neil Hu,

    First of all, thanks for your reply. I really like LINQ to XML as it is a lot easier to parse xml, but I'm having a major issue with this method which forces me to use regex instead even though it is not a great idea.

    The big problem that I have is that while modifying my file using XDocument, upon saving the file or I exactly don't know when the files UTF-8 codes like &#x002A;, &#x0394;, &#x00E9;etc are converted to UTF-8 strings like *, Δ, é etc. Below is a sample of a simple operation using LINQ to XML

    string path = @"C:\User\Desktop\test.xml";
    XDocument doc = XDocument.Load(path);
    string name = doc.Root.Element("Emp").Element("lbl").Value;
    XDocument doc2 = XDocument.Load(@"C:\User\Desktop\test2.xml");
    doc2.Root.Element("Employee").SetElementValue("label", name);
    doc2.Save(@"C:\User\Desktop\test2.xml");

    I could not find a proper solution to this issue, can you tell me how do I modify my above mention code to prevent XDocument from doing these conversions automatically and keep the UTF-8 codes like &#x002A; &amp;lt; etc as it is?


    • Edited by Don Bradman Friday, October 20, 2017 3:29 PM
    Friday, October 20, 2017 3:28 PM
  • Try this simple approach

    var path=@"D:\C#\SampleXML\test.XML";
    			string text = File.ReadAllText(path);
    			int fig_count = Regex.Matches(text, @"fig id=""F").Count;
    			int tab_count = Regex.Matches(text, @"table-wrap id=""T").Count;
    			int eq_count = Regex.Matches(text, @"disp-formula id=""deqn").Count;
    			File.WriteAllText(path,Regex.Replace(File.ReadAllText(path), @"<fig-count count=""\d+""/>",@"<fig-count count="""+fig_count+@"""/>"));
    			File.WriteAllText(path,Regex.Replace(File.ReadAllText(path), @"<table-count count=""\d+""/>",@"<table-count count="""+tab_count+@"""/>"));
    			File.WriteAllText(path,Regex.Replace(File.ReadAllText(path), @"<eq-count count=""\d+""/>",@"<eq-count count="""+eq_count+@"""/>"));

    • Marked as answer by Don Bradman Sunday, October 22, 2017 3:54 PM
    Sunday, October 22, 2017 3:51 PM
  • This is my final program

    var workingPath=@"D:\sssssssssssssssss\jjj";
    			var files = new List<string>();
    			if (Directory.Exists(workingPath))
    			{
    				foreach (var f in Directory.GetDirectories(workingPath, "xml",
    				                                           SearchOption.AllDirectories))
    				{
    					files.AddRange(Directory.GetFiles(f, "*.xml"));
    				}
    			}
    			
    			foreach (var file in files) {
    				string text = File.ReadAllText(file);
    				int fig_count = Regex.Matches(text, @"fig id=""fig").Count;
    				int tab_count = Regex.Matches(text, @"table-wrap id=""table").Count;
    				int eq_count = Regex.Matches(text, @"disp-formula id=""deqn").Count;
    				File.WriteAllText(file,Regex.Replace(File.ReadAllText(file), @"<fig-count count=""\d+""/>",@"<fig-count count="""+fig_count+@"""/>"));
    				File.WriteAllText(file,Regex.Replace(File.ReadAllText(file), @"<table-count count=""\d+""/>",@"<table-count count="""+tab_count+@"""/>"));
    				File.WriteAllText(file,Regex.Replace(File.ReadAllText(file), @"<eq-count count=""\d+""/>",@"<eq-count count="""+eq_count+@"""/>"));
    			}

    Wednesday, November 1, 2017 2:34 AM