locked
Regex Problem with Whitespaces RRS feed

  • Question

  • User1122355199 posted

    Hello everyone and thanks for your help in advance.  I am trying to parse some text extracted from a pdf that looks something like:

    Patient:   TESTCASE, THOMAS T            MRN: 1234567

    The word "Patient is preceded by one or more spaces and there are one or more spaces following "Patient:" and an indeterminable number of spaces following "Thomas T" and preceding "MRN".  I've tried:

    RegexOptions.IgnorePatternWhitespace

    and:

    string pattern = "Patient:" & "([\s+]+)"

    but neither work.  Any help would be appreciated.

    Monday, May 14, 2018 7:45 PM

Answers

  • User475983607 posted

    Not a regex solution but it might work.

        class Program
        {
            static void Main(string[] args)
            {
                string line = @"   Patient: TESTCASE, THOMAS T            MRN: 1234567 ";
    
                string[] fields = { "Patient", "MRN" };
    
                foreach(string field in fields)
                {
                    int idx = line.IndexOf(field);
                    line = line.Insert(idx, "|");
                }
    
                string[] items = line.Trim().Split(new char[] { '|' }, StringSplitOptions.RemoveEmptyEntries).Select(p => p.Trim()).ToArray();
    
                string json = "[{";
                for(int i = 0; i < items.Count(); i++)
                {
                    string[] nameValue = items[i].Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries).Select(p => p.Trim()).ToArray();
                    json += $"\"{nameValue[0]}\" : \"{nameValue[1]}\"{(i== items.Count()-1 ? "}]" : ", ")}";
                }
    
                //Console.WriteLine(json);
    
                Record[] records = JsonConvert.DeserializeObject<Record[]>(json);
    
                Console.WriteLine("Patient\t\t\tMRN");
                foreach (var r in records)
                {
                    Console.WriteLine("{0}\t{1}", r.Patient, r.MRN);
                }
    
            }
    
            public class Record
            {
                public string Patient { get; set; }
                public string MRN { get; set; }
            }
        }

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Monday, May 14, 2018 8:56 PM
  • User303363814 posted

    I wish I had a dollar for every time I saw the word 'Regex' followed by the word 'Problem' ...

    What if you don't use a Regex?

    var input = "Patient:   TESTCASE, THOMAS T            MRN: 1234567";
    var fields = input.Split(new []{"Patient:", "MRN:"}, StringSplitOptions.RemoveEmptyEntries)
                      .Select(s => s.Trim());

    Use the keywords as separators to string.split, trim the leading and trailing spaces.

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Monday, May 14, 2018 10:40 PM
  • User36583972 posted

    HI kmcnet,

    Thanks for the response.  Would you please explain the portion of the code:

    .Select(s => s.Trim());

    I don't work with Linq (probably need to learn).  Thanks again.

    You can refer the following description.

        // Summary:
            //     Removes all leading and trailing white-space characters from the current System.String
            //     object.
            //
            // Returns:
            //     The string that remains after all white-space characters are removed from the
            //     start and end of the current string. If no characters can be trimmed from the
            //     current instance, the method returns the current instance unchanged.
            public String Trim();


    You can refer the following links for getting more detailed about Lambda Expressions.

    Lambda Expressions (C# Programming Guide):
    https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/statements-expressions-operators/lambda-expressions

    How to select some specific column in Lambda Expression LINQ in Entity framework
    https://forums.asp.net/t/2011108.aspx?How+to+select+some+specific+column+in+Lambda+Expression+LINQ+in+Entity+framework+

    Best Regards,

    Yong Lu

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, May 16, 2018 8:30 AM

All replies

  • User475983607 posted

    Not a regex solution but it might work.

        class Program
        {
            static void Main(string[] args)
            {
                string line = @"   Patient: TESTCASE, THOMAS T            MRN: 1234567 ";
    
                string[] fields = { "Patient", "MRN" };
    
                foreach(string field in fields)
                {
                    int idx = line.IndexOf(field);
                    line = line.Insert(idx, "|");
                }
    
                string[] items = line.Trim().Split(new char[] { '|' }, StringSplitOptions.RemoveEmptyEntries).Select(p => p.Trim()).ToArray();
    
                string json = "[{";
                for(int i = 0; i < items.Count(); i++)
                {
                    string[] nameValue = items[i].Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries).Select(p => p.Trim()).ToArray();
                    json += $"\"{nameValue[0]}\" : \"{nameValue[1]}\"{(i== items.Count()-1 ? "}]" : ", ")}";
                }
    
                //Console.WriteLine(json);
    
                Record[] records = JsonConvert.DeserializeObject<Record[]>(json);
    
                Console.WriteLine("Patient\t\t\tMRN");
                foreach (var r in records)
                {
                    Console.WriteLine("{0}\t{1}", r.Patient, r.MRN);
                }
    
            }
    
            public class Record
            {
                public string Patient { get; set; }
                public string MRN { get; set; }
            }
        }

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Monday, May 14, 2018 8:56 PM
  • User303363814 posted

    I wish I had a dollar for every time I saw the word 'Regex' followed by the word 'Problem' ...

    What if you don't use a Regex?

    var input = "Patient:   TESTCASE, THOMAS T            MRN: 1234567";
    var fields = input.Split(new []{"Patient:", "MRN:"}, StringSplitOptions.RemoveEmptyEntries)
                      .Select(s => s.Trim());

    Use the keywords as separators to string.split, trim the leading and trailing spaces.

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Monday, May 14, 2018 10:40 PM
  • User1122355199 posted

    Thanks for the response.  Would you please explain the portion of the code:

    .Select(s => s.Trim());

    I don't work with Linq (probably need to learn).  Thanks again.

    Wednesday, May 16, 2018 2:10 AM
  • User36583972 posted

    HI kmcnet,

    Thanks for the response.  Would you please explain the portion of the code:

    .Select(s => s.Trim());

    I don't work with Linq (probably need to learn).  Thanks again.

    You can refer the following description.

        // Summary:
            //     Removes all leading and trailing white-space characters from the current System.String
            //     object.
            //
            // Returns:
            //     The string that remains after all white-space characters are removed from the
            //     start and end of the current string. If no characters can be trimmed from the
            //     current instance, the method returns the current instance unchanged.
            public String Trim();


    You can refer the following links for getting more detailed about Lambda Expressions.

    Lambda Expressions (C# Programming Guide):
    https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/statements-expressions-operators/lambda-expressions

    How to select some specific column in Lambda Expression LINQ in Entity framework
    https://forums.asp.net/t/2011108.aspx?How+to+select+some+specific+column+in+Lambda+Expression+LINQ+in+Entity+framework+

    Best Regards,

    Yong Lu

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, May 16, 2018 8:30 AM
  • User1122355199 posted

    Thanks to everyone for the help.

    Tuesday, May 22, 2018 12:13 AM