locked
Getting an array of numbers from a string RRS feed

  • Question

  • Hi to all,
    from a sample string like "string1_#0090.string2_#0004" I need to obtain the following array of numbers:
    90
    4

    Using Regex.Split(theString, @[^\d]) I get the following array:
    1
    0090
    2
    0004

    Basically I need to adjust the expression in order to only get numbers after "#" and without any leading zeros. I know it's very easy, but after hours of researches I still didn't found a solution.

    Thanks to all
    Friday, June 8, 2012 5:34 PM

Answers

  • (?<=#0*)[1-9]\d*

    Zero-Width Positive Lookbehind Assertion is what you are looking for:

    http://msdn.microsoft.com/en-us/library/bs2twtah#zerowidth_positive_lookbehind_assertion

    The Split function probably won't be able to do what you want. But you can use LINQ to get what you need:

    Dim l_test As String = "string1_#0090.string2_#0004" Dim l_pattern As String = "(?<=#0*)[1-9]\d*" Dim l_matches = Regex.Matches(l_test, l_pattern) Dim l_array = (From m In l_matches.Cast(Of Match)() Select m.Value).ToArray()



    • Edited by Cyborgx372 Friday, June 8, 2012 8:07 PM
    • Marked as answer by mark_555 Saturday, June 9, 2012 12:08 PM
    Friday, June 8, 2012 7:59 PM
  • I've just though of two issues with this expression. Strings that start with numbers and strings that end with text that doesn't contain a #. Let's try to solve those too.

    (?:(?:^\d+)?(?!\d).*?#0*(?=\d)|(?:.(?<!#0*\d+))+$)

    UPDATED IT AFTER EDIT

    I can't think of any more ways to fail this expression. But I can't guarantee it either. Make sure you test it thoroughly. It's become an unreadable beast, but I'll try to explain it:

    (?:^\d+)? to catch leading numbers

    (?<!#0*\d) to catch trailing text

    And the rest to catch anything in between.

    Only thing I haven't tested yet I guess is test that doesn't contain your text. Oh and I'm not sure if this will give you empty matches.



    My blog: blog.jessehouwing.nl




    Friday, June 8, 2012 9:56 PM
  • Don't use string.Split, but Regex.Matches, as in Cyborgs example

    You can use:

    Regex.Matches("(?<=#0*)\d+").Cast<Match>().Select(m => m.Value).ToArray();


    My blog: blog.jessehouwing.nl

    • Marked as answer by mark_555 Saturday, June 9, 2012 12:06 PM
    Friday, June 8, 2012 8:28 PM

All replies

  • (?<=#0*)[1-9]\d*

    Zero-Width Positive Lookbehind Assertion is what you are looking for:

    http://msdn.microsoft.com/en-us/library/bs2twtah#zerowidth_positive_lookbehind_assertion

    The Split function probably won't be able to do what you want. But you can use LINQ to get what you need:

    Dim l_test As String = "string1_#0090.string2_#0004" Dim l_pattern As String = "(?<=#0*)[1-9]\d*" Dim l_matches = Regex.Matches(l_test, l_pattern) Dim l_array = (From m In l_matches.Cast(Of Match)() Select m.Value).ToArray()



    • Edited by Cyborgx372 Friday, June 8, 2012 8:07 PM
    • Marked as answer by mark_555 Saturday, June 9, 2012 12:08 PM
    Friday, June 8, 2012 7:59 PM
  • Hi Cyborg and thanks for your suggestion. Unfortunately does not seem to work:

    Regex.Split("string1_#0090.string2_#0004", @"(?<=#0*)\d+");
    
    [0] string1_#
    [1] .string2_#
    [2] null
    Thanks

    Friday, June 8, 2012 8:17 PM
  • Don't use string.Split, but Regex.Matches, as in Cyborgs example

    You can use:

    Regex.Matches("(?<=#0*)\d+").Cast<Match>().Select(m => m.Value).ToArray();


    My blog: blog.jessehouwing.nl

    • Marked as answer by mark_555 Saturday, June 9, 2012 12:06 PM
    Friday, June 8, 2012 8:28 PM
  • Please see my example code for usage. Sorry - it's in VB.NET.
    Friday, June 8, 2012 8:31 PM
  • Sorry, your message was not yet edited when I posted my reply.
    Unfortunately I need to use the Split function due to internal logic of my application.
    With your new expression I get:
    "string1_#00"
    ".string2_#000"

    Strangely, unlike VS, the Expresso tool recognizes the expression as invalid. Is there a way to reverse the expression in order to get the result directly from the Split function?
    Friday, June 8, 2012 8:33 PM
  • The solution from Jesse which simulates the Split function works great. Also cyborg sample works as expected except for the fact that "0000" is not recognized as number 0. If I can also solve this problem we're done! Many thanks!

    "string1_#0090.string2_#0000"
    Expected result:
    90
    0
    Friday, June 8, 2012 8:56 PM
  • If you want to use string split you might try the following expression, it's a bit contrived, but it looks like it works

    (?!\d).*?#0*(?=\d)

    http://regexhero.net/tester/?id=d7942d69-7466-4caf-a8ba-ae53d03a1ba1


    My blog: blog.jessehouwing.nl

    Friday, June 8, 2012 9:04 PM
  • Jesse,
    you're a genius and your new expression works like a charm. The Expresso tool still returns wrong results, but on vs I get the expected results.
    I really don't understand this expression, but can you confirm me that this code will always return the expected results, at least as long as the # character is followed by digits ? Thanks
    Friday, June 8, 2012 9:45 PM
  • I've just though of two issues with this expression. Strings that start with numbers and strings that end with text that doesn't contain a #. Let's try to solve those too.

    (?:(?:^\d+)?(?!\d).*?#0*(?=\d)|(?:.(?<!#0*\d+))+$)

    UPDATED IT AFTER EDIT

    I can't think of any more ways to fail this expression. But I can't guarantee it either. Make sure you test it thoroughly. It's become an unreadable beast, but I'll try to explain it:

    (?:^\d+)? to catch leading numbers

    (?<!#0*\d) to catch trailing text

    And the rest to catch anything in between.

    Only thing I haven't tested yet I guess is test that doesn't contain your text. Oh and I'm not sure if this will give you empty matches.



    My blog: blog.jessehouwing.nl




    Friday, June 8, 2012 9:56 PM
  • Thanks Jesse, I really appreciate your help. The new expression also works with strings that start with numbers. Strings that doesn't contain the # character seems to returns an empty array with a lenght of 2 (at least with a string like "15string1_0090.string2_0004") and that's fine. Even if it returns letters it doesn't matter because i will be able to detect an invalid result unless it only returns numbers.

    Do you think that a failure could still returns an array of numbers ?

    My god, the expression it's really hard to understand and I'm pretty sure that, in case of failure, I will never be able to understand the reason, but what's really matter is that seems to work in most of the situations.
    Friday, June 8, 2012 11:08 PM
  • If you need to need to change it ever, then you can always pay me a lot of money ;)

    My blog: blog.jessehouwing.nl

    Saturday, June 9, 2012 7:25 AM
  • You're right, you gave me much more than a simple suggestion :-)
    At this point this thread can be marked as solved.
    Saturday, June 9, 2012 12:06 PM