none
How to do a regex search and replace all using elements from a 2d array in c#? RRS feed

  • Question

  • How do I use the contents of a 2d array to do a search and replace all in multiple files.

    Say, I have a 2d array

    string[,] array = new string[,]
                {
                    {@"(\d+)-(\w)", @"$1\n$2"},
                    {@"(\w+),\s(\w+)", @"$1=$2"},
    
                     .... so on
                };


    How do I replace (\d+)-(\w) by $1\n$,   (\w+),\s(\w+) by $1=$2 and so on...

    Code done so far

    string path=@"D:\Sample";
    			string[] myFiles = Directory.GetFiles(path, "*.txt",SearchOption.AllDirectories);
    			string[,] array = new string[,]
    			{
    				{@"(\d+)-(\w)", @"$1\n$2"},
    				{@"(\w+),\s(\w+)", @"$1=$2"},
    			};
    			foreach (var file in myFiles) {
    				for (int j = 0; j < array.GetLength(0); j++)
    				{
    					for (int i = 0; i < array.GetLength(1); i++)
    					{
    						File.WriteAllText(file, Regex.Replace(File.ReadAllText(file), array[j][j], array[j][i]));
    					}
    				}
    			}


    But I'm getting a ton of errors like

    Wrong number of indices inside []; expected 2
    The best overloaded method match for 'System.Text.RegularExpressions.Regex.Replace(string, string, string)' has some invalid arguments
    Argument 2: cannot convert from 'char' to 'string'
    All in the line

    File.WriteAllText(file, Regex.Replace(File.ReadAllText(file), array[j][j], array[j][i]));

    Is my approach wrong? How to correct this?


    • Edited by Don Bradman Wednesday, October 11, 2017 1:49 PM
    Wednesday, October 11, 2017 1:48 PM

Answers

  • The errors you're getting is because you're trying to declare a jagged array using rectangular syntax. I wouldn't recommend using a 2D array here. You always have 2 elements for each row and it just complicates something that should be simple. Depending upon the version of the language you're using I'd use an anonymous type if the patterns are defined within the same method, a tuple for newer versions of C# and a custom struct for older versions.

    var path = @"D:\Sample";
    var myFiles = Directory.GetFiles(path, "*.txt", SearchOption.AllDirectories);
    
    //If this is being done inline then use an anonymous type
    var expressions = new[]
    {
        new { FindText = @"(\d+)-(\w)", ReplaceText = @"$1\n$2"},
        new { FindText = @"(\w+),\s(\w+)", ReplaceText = @"$1=$2"},
    };
    
    foreach (var file in myFiles)
    {
        foreach (var expression in expressions)
        {
            var content = File.ReadAllText(file);
            content = Regex.Replace(content, expression.FindText, expression.ReplaceText);
            File.WriteAllText(file, content);
        };
    }

    For tuples you just change the expression declaration. For a  custom struct that can be more easily shared outside the code the declaration would change to something like this.

    struct FindReplaceExpression
    {
        public string FindText { get; set; }
        public string ReplaceText { get; set; }
    }
    
    //Change expressions declaration
    var expressions = new[]
    {
        new FindReplaceExpression { FindText = @"(\d+)-(\w)", ReplaceText = @"$1\n$2"},
        new FindReplaceExpression { FindText = @"(\w+),\s(\w+)", ReplaceText = @"$1=$2"},
    };


    Michael Taylor http://www.michaeltaylorp3.net

    Wednesday, October 11, 2017 2:14 PM
    Moderator
  • You can pass the options as part of the anony type if you want. Just put Options = in front of each of the values.

    var expressions = new[]
    {
        new { FindText = @"(\d+)-(\w)", ReplaceText = @"$1\n$2", Options = RegexOptions.IgnoreCase | RegexOptions.Singleline},
        new { FindText = @"(\w+),\s(\w+)", ReplaceText = @"$1=$2", Options = RegexOptions.None },
        new { FindText = @"(\d{2})-(\d{3})", ReplaceText = @"$1+$2", Options = RegexOptions.IgnoreCase },
    };

    Then in your Replace call use the Options property. Now you can mix and match options all you want. But if you only care about toggling case sensitivity then a simple boolean flag may be easier. This is especially true if you want to use some default options for all the expressions (like SingleLine).


    Michael Taylor http://www.michaeltaylorp3.net

    • Marked as answer by Don Bradman Thursday, October 12, 2017 1:52 PM
    Thursday, October 12, 2017 1:32 PM
    Moderator

All replies

  • The errors you're getting is because you're trying to declare a jagged array using rectangular syntax. I wouldn't recommend using a 2D array here. You always have 2 elements for each row and it just complicates something that should be simple. Depending upon the version of the language you're using I'd use an anonymous type if the patterns are defined within the same method, a tuple for newer versions of C# and a custom struct for older versions.

    var path = @"D:\Sample";
    var myFiles = Directory.GetFiles(path, "*.txt", SearchOption.AllDirectories);
    
    //If this is being done inline then use an anonymous type
    var expressions = new[]
    {
        new { FindText = @"(\d+)-(\w)", ReplaceText = @"$1\n$2"},
        new { FindText = @"(\w+),\s(\w+)", ReplaceText = @"$1=$2"},
    };
    
    foreach (var file in myFiles)
    {
        foreach (var expression in expressions)
        {
            var content = File.ReadAllText(file);
            content = Regex.Replace(content, expression.FindText, expression.ReplaceText);
            File.WriteAllText(file, content);
        };
    }

    For tuples you just change the expression declaration. For a  custom struct that can be more easily shared outside the code the declaration would change to something like this.

    struct FindReplaceExpression
    {
        public string FindText { get; set; }
        public string ReplaceText { get; set; }
    }
    
    //Change expressions declaration
    var expressions = new[]
    {
        new FindReplaceExpression { FindText = @"(\d+)-(\w)", ReplaceText = @"$1\n$2"},
        new FindReplaceExpression { FindText = @"(\w+),\s(\w+)", ReplaceText = @"$1=$2"},
    };


    Michael Taylor http://www.michaeltaylorp3.net

    Wednesday, October 11, 2017 2:14 PM
    Moderator
  • Thanks a lot CoolDadTx.

    BTW, I have a few questions

    1) My regex expressions may contain quotations like ", ' , so how do I escape them if I use @ at the beginning of the string which generally allows to freely write the expression without escaping.

    2) If I want to enable ignore case or multiline search and replace options, will

    content = Regex.Replace(content, expression.FindText, expression.ReplaceText,RegexOptions.IgnoreCase);
    do the job, what about some expression have that option enabled and some don't, is there a way to add that option in the anonymous types declaration?


    • Edited by Don Bradman Wednesday, October 11, 2017 3:43 PM
    Wednesday, October 11, 2017 3:02 PM
  • To escape a double quote in a string literal you use \". If it is verbatim then use 2 double quotes. Single quotes don't need to be escaped.

    var normalString = "Hello \"Bob\", how are you?";
    var verbatimString = @"Hello ""Bob"", how are you?";
    var singleQuotes = "Hello 'Bob', how are you?";

    To set the options for the match then use the RegexOptions parameter.

    Regex.Replace(content, "", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
    If you want to make sure the RE pattern you give matches the whole line then add the ^ to the front of the pattern and the $ to the end. These are the "beginning of line" and "end of line" patterns.


    Michael Taylor http://www.michaeltaylorp3.net

    Wednesday, October 11, 2017 5:21 PM
    Moderator
  • Well, I was actually looking for a way to add RegexOptions.IgnoreCase | RegexOptions.Singleline  in the anonymous types declaration like

    var expressions = new[]
    {
        new { FindText = @"(\d+)-(\w)", ReplaceText = @"$1\n$2", RegexOptions.IgnoreCase | RegexOptions.Singleline},
        new { FindText = @"(\w+),\s(\w+)", ReplaceText = @"$1=$2"},
        new { FindText = @"(\d{2})-(\d{3})", ReplaceText = @"$1+$2", RegexOptions.IgnoreCase },
    };

    because I do not want every replace to have those ignore case feature enabled, only some of them.

    Creating two different anonymous types and doing regex replace twice once using the ignore case and once not using  ignore case should work but it would not be a efficient way of doing it, would it?

    Thursday, October 12, 2017 12:48 PM
  • You can pass the options as part of the anony type if you want. Just put Options = in front of each of the values.

    var expressions = new[]
    {
        new { FindText = @"(\d+)-(\w)", ReplaceText = @"$1\n$2", Options = RegexOptions.IgnoreCase | RegexOptions.Singleline},
        new { FindText = @"(\w+),\s(\w+)", ReplaceText = @"$1=$2", Options = RegexOptions.None },
        new { FindText = @"(\d{2})-(\d{3})", ReplaceText = @"$1+$2", Options = RegexOptions.IgnoreCase },
    };

    Then in your Replace call use the Options property. Now you can mix and match options all you want. But if you only care about toggling case sensitivity then a simple boolean flag may be easier. This is especially true if you want to use some default options for all the expressions (like SingleLine).


    Michael Taylor http://www.michaeltaylorp3.net

    • Marked as answer by Don Bradman Thursday, October 12, 2017 1:52 PM
    Thursday, October 12, 2017 1:32 PM
    Moderator