Answered by:
Regex Problem with Whitespaces

Question
-
User1122355199 posted
Hello everyone and thanks for your help in advance. I am trying to parse some text extracted from a pdf that looks something like:
Patient: TESTCASE, THOMAS T MRN: 1234567
The word "Patient is preceded by one or more spaces and there are one or more spaces following "Patient:" and an indeterminable number of spaces following "Thomas T" and preceding "MRN". I've tried:
RegexOptions.IgnorePatternWhitespace
and:
string pattern = "Patient:" & "([\s+]+)"
but neither work. Any help would be appreciated.
Monday, May 14, 2018 7:45 PM
Answers
-
User475983607 posted
Not a regex solution but it might work.
class Program { static void Main(string[] args) { string line = @" Patient: TESTCASE, THOMAS T MRN: 1234567 "; string[] fields = { "Patient", "MRN" }; foreach(string field in fields) { int idx = line.IndexOf(field); line = line.Insert(idx, "|"); } string[] items = line.Trim().Split(new char[] { '|' }, StringSplitOptions.RemoveEmptyEntries).Select(p => p.Trim()).ToArray(); string json = "[{"; for(int i = 0; i < items.Count(); i++) { string[] nameValue = items[i].Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries).Select(p => p.Trim()).ToArray(); json += $"\"{nameValue[0]}\" : \"{nameValue[1]}\"{(i== items.Count()-1 ? "}]" : ", ")}"; } //Console.WriteLine(json); Record[] records = JsonConvert.DeserializeObject<Record[]>(json); Console.WriteLine("Patient\t\t\tMRN"); foreach (var r in records) { Console.WriteLine("{0}\t{1}", r.Patient, r.MRN); } } public class Record { public string Patient { get; set; } public string MRN { get; set; } } }
- Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
Monday, May 14, 2018 8:56 PM -
User303363814 posted
I wish I had a dollar for every time I saw the word 'Regex' followed by the word 'Problem' ...
What if you don't use a Regex?
var input = "Patient: TESTCASE, THOMAS T MRN: 1234567"; var fields = input.Split(new []{"Patient:", "MRN:"}, StringSplitOptions.RemoveEmptyEntries) .Select(s => s.Trim());
Use the keywords as separators to string.split, trim the leading and trailing spaces.
- Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
Monday, May 14, 2018 10:40 PM -
User36583972 posted
HI kmcnet,
Thanks for the response. Would you please explain the portion of the code:
.Select(s => s.Trim());
I don't work with Linq (probably need to learn). Thanks again.
You can refer the following description.
// Summary: // Removes all leading and trailing white-space characters from the current System.String // object. // // Returns: // The string that remains after all white-space characters are removed from the // start and end of the current string. If no characters can be trimmed from the // current instance, the method returns the current instance unchanged. public String Trim();
You can refer the following links for getting more detailed about Lambda Expressions.Lambda Expressions (C# Programming Guide):
https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/statements-expressions-operators/lambda-expressionsHow to select some specific column in Lambda Expression LINQ in Entity framework
https://forums.asp.net/t/2011108.aspx?How+to+select+some+specific+column+in+Lambda+Expression+LINQ+in+Entity+framework+Best Regards,
Yong Lu
- Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
Wednesday, May 16, 2018 8:30 AM
All replies
-
User475983607 posted
Not a regex solution but it might work.
class Program { static void Main(string[] args) { string line = @" Patient: TESTCASE, THOMAS T MRN: 1234567 "; string[] fields = { "Patient", "MRN" }; foreach(string field in fields) { int idx = line.IndexOf(field); line = line.Insert(idx, "|"); } string[] items = line.Trim().Split(new char[] { '|' }, StringSplitOptions.RemoveEmptyEntries).Select(p => p.Trim()).ToArray(); string json = "[{"; for(int i = 0; i < items.Count(); i++) { string[] nameValue = items[i].Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries).Select(p => p.Trim()).ToArray(); json += $"\"{nameValue[0]}\" : \"{nameValue[1]}\"{(i== items.Count()-1 ? "}]" : ", ")}"; } //Console.WriteLine(json); Record[] records = JsonConvert.DeserializeObject<Record[]>(json); Console.WriteLine("Patient\t\t\tMRN"); foreach (var r in records) { Console.WriteLine("{0}\t{1}", r.Patient, r.MRN); } } public class Record { public string Patient { get; set; } public string MRN { get; set; } } }
- Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
Monday, May 14, 2018 8:56 PM -
User303363814 posted
I wish I had a dollar for every time I saw the word 'Regex' followed by the word 'Problem' ...
What if you don't use a Regex?
var input = "Patient: TESTCASE, THOMAS T MRN: 1234567"; var fields = input.Split(new []{"Patient:", "MRN:"}, StringSplitOptions.RemoveEmptyEntries) .Select(s => s.Trim());
Use the keywords as separators to string.split, trim the leading and trailing spaces.
- Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
Monday, May 14, 2018 10:40 PM -
User1122355199 posted
Thanks for the response. Would you please explain the portion of the code:
.Select(s => s.Trim());
I don't work with Linq (probably need to learn). Thanks again.
Wednesday, May 16, 2018 2:10 AM -
User36583972 posted
HI kmcnet,
Thanks for the response. Would you please explain the portion of the code:
.Select(s => s.Trim());
I don't work with Linq (probably need to learn). Thanks again.
You can refer the following description.
// Summary: // Removes all leading and trailing white-space characters from the current System.String // object. // // Returns: // The string that remains after all white-space characters are removed from the // start and end of the current string. If no characters can be trimmed from the // current instance, the method returns the current instance unchanged. public String Trim();
You can refer the following links for getting more detailed about Lambda Expressions.Lambda Expressions (C# Programming Guide):
https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/statements-expressions-operators/lambda-expressionsHow to select some specific column in Lambda Expression LINQ in Entity framework
https://forums.asp.net/t/2011108.aspx?How+to+select+some+specific+column+in+Lambda+Expression+LINQ+in+Entity+framework+Best Regards,
Yong Lu
- Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
Wednesday, May 16, 2018 8:30 AM -
User1122355199 posted
Thanks to everyone for the help.
Tuesday, May 22, 2018 12:13 AM