locked
How can I extract numbers only from a string RRS feed

  • Question

  • Unfortunately my knowledge of Regular expression is very very limited.

    Given a string i want to remove any characters (spaces,commas "/" etc... ) and return a string containing only numbers 0-9.
    EG  "1-2345- 2,?:434343 ".

    How Can I achieve this?

    thanks a lot
    Thanks for your help
    Thursday, July 17, 2008 7:04 PM

Answers

  • The pattern of interest is simply any number: \d

    We can directly use @"\d" to match on any digit via the Regex.Matches method (I prefer this over the next approach):

    string input = @"1-02345- 2,?:434343"
     
    // Match only digits 
    string pattern = @"\d"
             
    StringBuilder sb = new StringBuilder(); 
     
    foreach (Match m in Regex.Matches(input, pattern)) 
        sb.Append(m); 
    Console.WriteLine("{0} {1}""Final string (matches approach):", sb.ToString()); 
     
    // Result: 
    // Final string (matches approach): 1023452434343 


    Alternately, you can use the Regex.Split method and use @"[^\d]" as the pattern to split on. Here the logic is reversed; I'm instructing it to split on anything that is NOT a number and I'm excluding it from the match, so essentially all I'm left with will be a number:

    string input = @"1-02345- 2,?:434343"
     
    // Match anything that is NOT a digit 
    string splitPattern = @"[^\d]"
             
    // Split approach: split on the pattern and exclude the match, hence the reverse logic of 
    // matching on anything that is NOT a digit 
    string[] results = Regex.Split(input, splitPattern); 
     
    StringBuilder sb = new StringBuilder(); 
             
    foreach (string s in results) 
        sb.Append(s); 
    Console.WriteLine("{0} {1}""Final string (split approach):", sb.ToString()); 
     
    // Result: 
    // Final string (split approach): 1023452434343 

    As mentioned above, I prefer the former approach over the latter. The former's intention is clearer.



    Document my code? Why do you think it's called "code"?
    • Edited by Ahmad Mageed Friday, July 18, 2008 3:56 AM added links & reorganized
    • Marked as answer by devBrix Friday, July 18, 2008 11:17 AM
    Friday, July 18, 2008 3:39 AM

All replies

  • The pattern of interest is simply any number: \d

    We can directly use @"\d" to match on any digit via the Regex.Matches method (I prefer this over the next approach):

    string input = @"1-02345- 2,?:434343"
     
    // Match only digits 
    string pattern = @"\d"
             
    StringBuilder sb = new StringBuilder(); 
     
    foreach (Match m in Regex.Matches(input, pattern)) 
        sb.Append(m); 
    Console.WriteLine("{0} {1}""Final string (matches approach):", sb.ToString()); 
     
    // Result: 
    // Final string (matches approach): 1023452434343 


    Alternately, you can use the Regex.Split method and use @"[^\d]" as the pattern to split on. Here the logic is reversed; I'm instructing it to split on anything that is NOT a number and I'm excluding it from the match, so essentially all I'm left with will be a number:

    string input = @"1-02345- 2,?:434343"
     
    // Match anything that is NOT a digit 
    string splitPattern = @"[^\d]"
             
    // Split approach: split on the pattern and exclude the match, hence the reverse logic of 
    // matching on anything that is NOT a digit 
    string[] results = Regex.Split(input, splitPattern); 
     
    StringBuilder sb = new StringBuilder(); 
             
    foreach (string s in results) 
        sb.Append(s); 
    Console.WriteLine("{0} {1}""Final string (split approach):", sb.ToString()); 
     
    // Result: 
    // Final string (split approach): 1023452434343 

    As mentioned above, I prefer the former approach over the latter. The former's intention is clearer.



    Document my code? Why do you think it's called "code"?
    • Edited by Ahmad Mageed Friday, July 18, 2008 3:56 AM added links & reorganized
    • Marked as answer by devBrix Friday, July 18, 2008 11:17 AM
    Friday, July 18, 2008 3:39 AM
  •  Fantastic!!
    That is exactly what I was looking for.

    Do you know of any free tool that you can build regular expressions in .net where not much learning curve is required.

    thanks again!!!
    Thanks for your help
    Friday, July 18, 2008 11:18 AM
  • I suggest looking at the sticky thread at the top of this forum, here's the link: .Net Regex Resources Reference

    Scroll down to the "Programs" section. The learning curve usually isn't that bad with most of the tools. They tend to include predefined examples for you to choose from, see the regex pattern, the input, and see how it works.

    Here are the free ones. Perhaps you could visit the sites and look at the screenshots. They all look rather nice.
    Expresso - I currently use this, but at the time of this post the link seems to be down, so try again later.
    Rad Software's Regex Designer
    The Regulator
    RegexDesigner.NET - this is a smaller tool by Chris Sells at Microsoft (he has authored numerous books and articles)

    If you were looking for something to use from within VS the author of The Regulator has a Regular Expression Visualizers addon, look for it in the Programs section of the first link in this post.

    A nice quick tutorial is the 30 minute regex tutorial, you can check it out here: http://www.codeproject.com/KB/dotnet/regextutorial.aspx - you can find others in the sticky I mentioned earlier.

    Enjoy :)

    Document my code? Why do you think it's called "code"?
    Friday, July 18, 2008 3:13 PM
  • again.Extremely grateful for your time and effort in your replies.THANKS.


    Thanks for your help
    Saturday, July 19, 2008 6:31 AM
  • Hi guys,

    I'm looking to do something similar here.

    I have a collection of strings and I need to order by the first digit, second digit, third digit.... etc...

    For example below there are 3 strings in the collection:

    4/11H19/12

    4/11H3/1

    4/1/11H2

    Now I need ignore the non-numeric characters, and sort by each digit starting from the left.

    So the above list would become:

    4/1/11H2

    4/11H3/1

    4/11H19/12

    Many thanks for any info you can supply.

    James Hemphill
    Thursday, July 24, 2008 3:56 PM