none
Regular Expression Ignore Before RRS feed

  • Question

  • I have been trying to get a regular expression to work but I have not had any luck so far.

    I want to ignore everything preceding and including the regular expression.  I thought it would be easy but I am just missing it.

    I have a file name "30827_msit-02_feds_jan_2012.xls" and I would like to ignore the first half "30827_msit-02_".  The file names are the same basic format <digits>_msit-<digits>.

    I am really looking to get the "<feds>" part out of the name.

    Any help would be appreciated.

    Friday, August 10, 2012 10:35 PM

Answers

  • The SuibMatch property is how you access a capturing group, as I wrote in a message dated Aug 11.  Since the regex in your current routine does not include a capturing group, attempting to access the submatch property will return an invalid response.

    In addition, when you access a SubMatch, you must specify an Item number, which I showed in my example, but you failed to include.

    If you are not using capturing groups, then your segment should read:

    If re.Test(InValue) = True Then
       Set mc = re.Execute(InValue)
       For Each m In mc
          Debug.Print m
       Next m
    End If
    

    If you are going to use a capturing group, and want to return capturing group 1, then you need to change your regex to:

    Const sPat As String = "(the)"

    and the code snippet to:

    If re.Test(InValue) = True Then
       Set mc = re.Execute(InValue)
       For Each m In mc
          Debug.Print m.SubMatches(0)
       Next m
    End If



    Ron

    Wednesday, August 22, 2012 10:01 AM

All replies

  • On Fri, 10 Aug 2012 22:35:22 +0000, K. Kazinski wrote:
     
    >
    >
    >I have been trying to get a regular expression to work but I have not had any luck so far.
    >
    >I want to ignore everything preceding and including the regular expression.  I thought it would be easy but I am just missing it.
    >
    >I have a file name "30827_msit-02_feds_jan_2012.xls" and I would like to ignore the first half "30827_msit-02_".  The file names are the same basic format <digits>_msit-<digits>.
    >
    >I am really looking to get the "<feds>" part out of the name.
    >
    >Any help would be appreciated.
     
    This is a VBA forum so here is a VBA routine using regex to output what you describe. 
     
    Option Explicit
    Sub foo()
    Const str As String = "30827_msit-02_feds_jan_2012.xls"
      Dim re As Object
      Dim Result As String
    Set re = CreateObject("vbscript.regexp")
    With re
        .Global = True
        .Pattern = "\d{5}_msit-\d{2}_"
       Result = .Replace(str, "")
    End With
    Debug.Print Result
    End Sub
     
    If the numbers of digits before and after _msit- can vary, change the quantifiers after the \d to reflect the allowable digit count.
     
     

    Ron
    Saturday, August 11, 2012 12:43 AM
  • Thanks Ron - that is what I did, replaced each part until I was only left with the part I was looking for.

    I thought there was some syntax to ignore the group before something like (?<\d{5}_msit-\d{2}_)".

    Saturday, August 11, 2012 12:50 AM
  • On Sat, 11 Aug 2012 00:50:27 +0000, K. Kazinski wrote:
     
    >
    >
    >Thanks Ron - that is what I did, replaced each part until I was only left with the part I was looking for.
    >
    >I thought there was some syntax to ignore the group before something like (?<\d{5}_msit-\d{2}_)".
    >
     
    Lookbehinds are not a feature of Javascript/Vbscript which is the flavor used in VBA. 
     
    You can use Lookbehinds in other flavors; you could also use capturing groups. 
     
    However, the Replace method would probably be faster even if you were working in a flavor that allowed lookbehinds.
     

    Ron
    Saturday, August 11, 2012 1:07 AM
  • I have never had very good luck with capturing groups.

    Here is some code I have tried to use in a function

    Dim Matches As MatchCollection
    Dim MatchVal As Match

    Dim Regex As New VBScript_RegExp_55.RegExp
    Dim OutVal As String

    Regex.Pattern = Pattern
    Regex.IgnoreCase = IgnoreCase
    Regex.Global = True

    Set Matches = Regex.Execute(InValue)

    Matches does not seem to have data.

    Do you have any code sameples?

    Saturday, August 11, 2012 6:05 PM
  • The capturing group would be a submatch, and you also need to specify the match collection item number.  So, to demo:

    ===================================
    'Set reference to Microsoft VBScript Regular Expressions 5.5
    Option Explicit
    Sub foo()
        Debug.Print FileName("30827_msit-02_feds_jan_2012.xls")
    End Sub

    Function FileName(s As String) As String
        Dim re As RegExp, mc As MatchCollection
        Const sPat As String = "\d{5}_msit-\d{2}_(.*)"
    Set re = New RegExp
    With re
        .IgnoreCase = False
        .Global = True
        .Pattern = sPat
    End With
    If re.Test(s) = True Then
        Set mc = re.Execute(s)
        FileName = mc(0).SubMatches(0)
    End If
    End Function
    =============================

    You may need to alter the regex to fit your data variability as I previously wrote.


    Ron


    Edit:  You don't need the non-capturing group in this instance.  And I have eliminated it in my edit.
    Saturday, August 11, 2012 6:30 PM
  • Thanks for all the help.

    If there are multiple matches how do you get to them?  Is it through the submatches?

    Saturday, August 11, 2012 6:52 PM
  • No, you cycle through the match collection.  Something like:

    dim m as match
    ....
    Set mc = re.Execute(s)
      for each m in mc
        debug.print m.submatches(0)
      next m
    ...


    Ron

    Saturday, August 11, 2012 7:40 PM
  • Hi Ron,

    I keep getting a "Invalid procedure call or argument (Error 5)" on the m.submatches(0).

    In the vba editor when I type "m." I see count and item.

    Saturday, August 11, 2012 8:24 PM
  • Hi K.,

    Thanks for posting in the MSDN Forum.

    Would you please try this snippet:

    Sub mytest1()
        Const Target As String = "30827_msit-02_feds_jan_2012.xls"
        Const Pattern As String = "\d{5}_msit-\d{2}_(.*)"
        Dim re As RegExp
        Dim mc As MatchCollection
        
        Set re = New VBScript_RegExp_55.RegExp
        With re
            .ignorecase = False
            .Global = True
            .Pattern = Pattern
        End With
        If re.test(Target) = True Then
            Set mc = re.Execute(Target)
            MsgBox mc(0).submatches(0)
        End If
        
    End Sub

    Have a good day,

    Tom


    Tom Xu [MSFT]
    MSDN Community Support | Feedback to us

    Friday, August 17, 2012 6:45 AM
    Moderator
  • I am out of town now. Will respond next week when I am back.

    Ron

    Friday, August 17, 2012 5:36 PM
  • On Sat, 11 Aug 2012 20:24:48 +0000, K. Kazinski wrote:
     
    >
    >
    >Hi Ron,
    >
    >I keep getting a "Invalid procedure call or argument (Error 5)" on the m.submatches(0).
    >
    >In the vba editor when I type "m." I see count and item.
    >
     
    Please post some examples of your data showing the multiple matches, and also of the code I provided as you have adapted it for your purposes.
     

    Ron
    Wednesday, August 22, 2012 1:00 AM
  • InValue = "The lazy dog jumped the brown fox."

    Dim re As RegExp, mc As MatchCollection
    Const sPat As String = "the"
    Set re = New RegExp
    With re
       .IgnoreCase = True
       .Global = True
       .Pattern = sPat
    End With

    If re.Test(InValue) = True Then
       Set mc = re.Execute(InValue)
       Dim m As Match
       For Each m In mc
          Debug.Print m.SubMatches
          ' mc(0).SubMatches(0)
       Next m

    End If

    Wednesday, August 22, 2012 1:13 AM
  • Hi K.

    Debug.Print m.SubMatches

    This statement is wrong. I think it must be "Debug.Print m.SubMatches(0).

    Have a good day,

    Tom


    Tom Xu [MSFT]
    MSDN Community Support | Feedback to us

    Wednesday, August 22, 2012 6:44 AM
    Moderator
  • The SuibMatch property is how you access a capturing group, as I wrote in a message dated Aug 11.  Since the regex in your current routine does not include a capturing group, attempting to access the submatch property will return an invalid response.

    In addition, when you access a SubMatch, you must specify an Item number, which I showed in my example, but you failed to include.

    If you are not using capturing groups, then your segment should read:

    If re.Test(InValue) = True Then
       Set mc = re.Execute(InValue)
       For Each m In mc
          Debug.Print m
       Next m
    End If
    

    If you are going to use a capturing group, and want to return capturing group 1, then you need to change your regex to:

    Const sPat As String = "(the)"

    and the code snippet to:

    If re.Test(InValue) = True Then
       Set mc = re.Execute(InValue)
       For Each m In mc
          Debug.Print m.SubMatches(0)
       Next m
    End If



    Ron

    Wednesday, August 22, 2012 10:01 AM