none
Regex matches from items in a dictionary in c#? RRS feed

  • Question

  • I have a dictionary whose key's are a regex and values are string. I want to match each regex (one by one) from the dictionary in multiple files in a location and if a match occurs then write the value counterpart of that regex key to a log file.
    Also, the program should move on to the next regex in the dictionary if a single match is found in a single file i.e. if the first file gets a hit with the match then no need to check the rest of the files for that same regex.

    I have some 100+ regex to do a search on about 30-50 files (approx size of file is 1Mb)

    I tried

    {
    	Dictionary<string,string> dict=new Dictionary<string, string>();
    	dict.Add(@"\d+-\d+","numRange");
    	dict.Add(@"[a-z] -[0-9]+","checkHyphen");
    	dict.Add(@".\d{4}","Doi present");
    	//... so on
    	string path=@"D:\TestFile";
    	string[] files=Directory.GetFiles(path,"*.txt",SearchOption.AllDirectories);
    	foreach (var file in files)
    	{
    		string content=	File.ReadAllText(file);
    		foreach (KeyValuePair<string,string> element in dict)
    		{
    			Regex r = new Regex(element.Key);
    			if(r.IsMatch(content))
    			{
    				if (!File.Exists(path+"\\log.txt"))
    				{
    					File.Create(path+"\\log.txt").Dispose();
    
    				}
    				StreamWriter file2 = new StreamWriter(path+"\\log.txt", true);
    				file2.WriteLine(element.Value);
    				file2.Close();
    
    			}
    		}
    		break;
    	}
    }

    It works fine till now, is there a faster way of doing this?

    If yes, can anyone show me how?

    Friday, February 2, 2018 2:36 AM

Answers

  • Hi,

    If you want a fast way to loop. consider use Parallel.ForEach as below.

     public void ParallelTask()
            {
                Dictionary<string, string> dict = new Dictionary<string, string>();
                dict.Add(@"\d+-\d+", "numRange");
                dict.Add(@"[a-z] -[0-9]+", "checkHyphen");
                dict.Add(@".\d{4}", "Doi present");
                //... so on
                string path = @"D:\TestFile";
                string[] files = Directory.GetFiles(path, "*.txt", SearchOption.AllDirectories);
    
                if (!File.Exists(path + "\\log.txt"))
                {
                    File.Create(path + "\\log.txt").Dispose();
                }
    
                ParallelLoopResult result = Parallel.ForEach<string>(files, filePath =>
                {
                    string content = File.ReadAllText(filePath);
                    foreach (KeyValuePair<string, string> element in dict)
                    {
                        Regex r = new Regex(element.Key);
    
                        if (r.IsMatch(content))
                        {
                           
                            StreamWriter file2 = new StreamWriter(path + "\\log.txt", true);
                            file2.WriteLine(element.Value);
                            file2.Close();
    
                        }
                    }
                });  
            }

    Above code not test, some logic may require you to implement.

    Best Regards,

    Bob


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, February 5, 2018 10:04 AM

All replies

  • Just one point that caught my eye. Each time your regex gets a match you're invoking File.Exists() and checking for your logfile. That seems to me a bit too costly. Maybe, you could check once for your logfile before you enter your loops, or, if creating the logfile should depend on whether there are any matches, you could use a local variable of type boolean as flag:

    if (logUnchecked && !File.Exists(path+"\\log.txt"))
    {
    	File.Create(path+"\\log.txt").Dispose();
            logUnchecked = false;
    }

    wizend

    Friday, February 2, 2018 6:10 PM
  • Check this idea:

    . . .
    string pattern = ""; var values = new List<string>( dict.Count ); foreach( var p in dict ) { pattern += "(" + p.Key + ")|"; values.Add( p.Value ); } pattern = pattern.TrimEnd( '|' ); var re = new Regex( pattern, RegexOptions.Compiled ); foreach( var file in files ) { string content = File.ReadAllText( file ); Match m = re.Match( content ); while( m.Success ) { for( int i = 1; ; ++i ) { if( m.Groups[i].Success ) { string result = values[i - 1]; // TODO: append the result to 'log.txt' // . . . break; } } m = m.NextMatch(); } }


    It assumes that the regular expressions do not contain “( )”.

    Maybe reading of each next file can be done in parallel. See StreamReader.ReadToEndAsync and Task.Result.



    • Edited by Viorel_MVP Friday, February 2, 2018 8:18 PM
    Friday, February 2, 2018 8:08 PM
  • Hi Viorel,

    What if the regex contain "()"?

    Also, can you explain your approach in a bit detail?

    Sunday, February 4, 2018 2:13 AM
  • Hi,

    If you want a fast way to loop. consider use Parallel.ForEach as below.

     public void ParallelTask()
            {
                Dictionary<string, string> dict = new Dictionary<string, string>();
                dict.Add(@"\d+-\d+", "numRange");
                dict.Add(@"[a-z] -[0-9]+", "checkHyphen");
                dict.Add(@".\d{4}", "Doi present");
                //... so on
                string path = @"D:\TestFile";
                string[] files = Directory.GetFiles(path, "*.txt", SearchOption.AllDirectories);
    
                if (!File.Exists(path + "\\log.txt"))
                {
                    File.Create(path + "\\log.txt").Dispose();
                }
    
                ParallelLoopResult result = Parallel.ForEach<string>(files, filePath =>
                {
                    string content = File.ReadAllText(filePath);
                    foreach (KeyValuePair<string, string> element in dict)
                    {
                        Regex r = new Regex(element.Key);
    
                        if (r.IsMatch(content))
                        {
                           
                            StreamWriter file2 = new StreamWriter(path + "\\log.txt", true);
                            file2.WriteLine(element.Value);
                            file2.Close();
    
                        }
                    }
                });  
            }

    Above code not test, some logic may require you to implement.

    Best Regards,

    Bob


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, February 5, 2018 10:04 AM