none
looking for help on a replacement regular expression to extract units and apply them to all numeric values in a string RRS feed

  • Question

  • I need a little help in specifying the regular expression(s) to process the following strings:

    sample input strings: "0 psi", "0 to 100 mmhg", "-4 to 10 inH2O", 30Kpa

    desired output: "0 psi", "0 mmhg to 100 mmhg", "-4 inH2O to 10 inH2O", "30Kpa"

    What I have so far (but doesn't quite work) is..

    extraction expression:

    (?<num>(\d+(\.\d*)?)(?<units>(?i:(mmhg|psi|bar|kpa|inh2o|cmh2o|kg/cm2|k))))

    replacement expression: (${num}${units})

    I've been groping around the internet for how to apply the "units" group to each of the captured "num" group but haven't had any luck. Any help would be appreciated.

     

     


    Developer Frog Haven Enterprises

    Tuesday, March 26, 2013 3:32 PM

Answers

  • Hi Larry,

    I modified a little of your RE:

    Find RE:
    (?<num1>(-?\d+(\.\d*)?))\s*to\s*(?<num2>(-?\d+(\.\d*)?))\s*(?<units>(?i:(mmhg|psi|bar|kpa|inh2o|cmh2o|kg/cm2|k)))
    
    Replace RE
    ${num1} ${units} to ${num2} ${units}

    it meets

    >> desired output: "0 psi", "0 mmhg to 100 mmhg", "-4 inH2O to 10 inH2O", "30Kpa"

    Best regards,


    Mike Feng
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Wednesday, March 27, 2013 3:18 PM
    Moderator
  • Mike: Just a comment. It looks like on Tuesday Elbilo changed his requirements to the following:(${num1}${units},${num2}${units})

    Note he doesn't have any spaces between th enumeric values and the units. I had my code working with the above except the last case where I ended up having on one numeric value but had the units repeated twice.

    Mike's code only mathces the 2nd and 3rd case where the word two is used.  Mikes code will not remove a space between the values and the units in case 1 & 4.  I'm not sure if the space between the numeric value and the units is important or not important.


    jdweng

    Wednesday, March 27, 2013 4:35 PM

All replies

  • Your expression that you currently have will two sets of units where they alreay exists.  So try it in 3 steps

    1) Get the unit type

    2) Delete the units

    3) Replace the numbers with the number and the units.


    jdweng

    Tuesday, March 26, 2013 4:12 PM
  • I was hoping to avoid rewriting my code. The expressions are read from an XML file that's part of a much larger process, which doesn't allow for storing intermediates. From what I've read so far it seems what I'm attempting should be achievable in a single regular expression replacement step, if I can figure out the capture/replacement syntax.

    so far I've only been able to find strings with no more than one or 2 numeric values and the units are always (when the exist at all) at the end of the string.


    Developer Frog Haven Enterprises

    Tuesday, March 26, 2013 4:25 PM
  • I'm thinking of using matches something like the code below.  The match collection item contains the starting position in the orignal string and the length of the matched item.  When replacing items in a string, you need to start at the end of the string so you maintain the starting position of each item when you do a replacements with different lengths.

                string[] input = { "0 psi", "0 to 100 mmhg", "-4 to 10 inH2O", "30Kpa" };
                foreach (string measurement in input)
                {
                    MatchCollection matches = pattern.Matches(measurement);
                    for(int index = matches.Count - 1; index >= 0; index--)
                    {
                      
                    }
                }


    jdweng

    Tuesday, March 26, 2013 4:44 PM
  • I greatly appreciate your assistance, but I don't have access to the code to make those types of changes. Here's what I've got so far that looks to be working, although I have a few thousands of records to check.

    extraction expression:

    (?<num1>(-?\d+(\.\d*)?))\s*,\s*(?<num2>(-?\d+(\.\d*)?))\s*(?<units>(?i:(mmhg|psi|bar|kpa|inh2o|cmh2o|kg/cm2|k)))

    replacement expression:

    (${num1}${units},${num2}${units})

    -Larry


    Developer Frog Haven Enterprises

    Tuesday, March 26, 2013 5:14 PM
  • I'm real close but can't work on it any more.  See the code below.

           static void Main(string[] args)
            {
                //string replacementStr = @"([\-\+]?\d+)\s*(\w+)";
                string replacementPattern = @"(?<num1>[\-\+]?\d+)\s+to\s+(?<num2>[\-\+]?\d+)\s*(?<units>\w+)|" +
                                        @"(?<num1>[\-\+]?\d+)\s*(?<units>\w+)to(?<num2>[\-\+]?\d+)\s*(?<units>\w+)|" +
                                        @"(?<num1>[\-\+]?\d+)\s*(?<units>\w+)";
                Regex pattern = new Regex(replacementPattern);
     
                string[] input = { "0 psi", "0 to 100 mmhg", "-4 to 10 inH2O", "30Kpa" };
     
     
                foreach (string measurement in input)
                {
                    string results = Regex.Replace(measurement,replacementPattern, @"(${num1}${units},${num2}${units}");
                    MatchCollection matches = pattern.Matches(measurement);
                    //for(int index = matches.Count - 1; index >= 0; index--)
                    //{
     
                    //}
                    Console.WriteLine(results);
                }
                Console.ReadLine();
            }


    jdweng

    Tuesday, March 26, 2013 10:26 PM
  • Hi Larry,

    I modified a little of your RE:

    Find RE:
    (?<num1>(-?\d+(\.\d*)?))\s*to\s*(?<num2>(-?\d+(\.\d*)?))\s*(?<units>(?i:(mmhg|psi|bar|kpa|inh2o|cmh2o|kg/cm2|k)))
    
    Replace RE
    ${num1} ${units} to ${num2} ${units}

    it meets

    >> desired output: "0 psi", "0 mmhg to 100 mmhg", "-4 inH2O to 10 inH2O", "30Kpa"

    Best regards,


    Mike Feng
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Wednesday, March 27, 2013 3:18 PM
    Moderator
  • Mike: Just a comment. It looks like on Tuesday Elbilo changed his requirements to the following:(${num1}${units},${num2}${units})

    Note he doesn't have any spaces between th enumeric values and the units. I had my code working with the above except the last case where I ended up having on one numeric value but had the units repeated twice.

    Mike's code only mathces the 2nd and 3rd case where the word two is used.  Mikes code will not remove a space between the values and the units in case 1 & 4.  I'm not sure if the space between the numeric value and the units is important or not important.


    jdweng

    Wednesday, March 27, 2013 4:35 PM
  • Hi Jdweng.

    Yes, thanks. Thank you for your reminding.

    The requirement of OP is variable. Let's waiting for OP.

    Thank you for your contribution on MSDN Forum.

    Have a nice day.

    Best regards,


    Mike Feng
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Thursday, March 28, 2013 12:18 PM
    Moderator