locked
Using Regular Expressions to Find File Addresses RRS feed

  • Question

  • How would I go about using Regular Expressions to determine if a string points to a file? (A web address of a file)

    An example would be:
    http://www.freewebs.com/programble/ (Does not point to a file)
    http://www.freewebs.com/programble/programs/fireworks/screenshots/screen1.bmp (Does point to a file)

    I would want a function to return from the above list only the second address, because it points to a file.
    Sunday, August 31, 2008 5:40 PM

Answers

  • i don't think this is regular expressions, but give it a try....
        Dim Url As String
            Url = "www.lol.com/lol.bmp"
            Dim count As New Integer
            Dim number As Integer = 0 - 5
            number += Url.Length
            For Each c As Char In Url.Substring(0)
                If count > number Then
                    If c = "." Then
                        MessageBox.Show("Points to a file")
                    End If
                End If

                count += 1
            Next
    Sunday, August 31, 2008 9:49 PM
  • I wonder if it might be easier to rule out the possible domain types since there are fewer domain extensions.  If you create a generic list you can iterate through it and check if the url ends with anyting in the list. 

     

    Example

    Dim url_list As New List(Of String)

    url_list.Add(".com")

    url_list.Add(".net")

     

    Dim s As String = Url.Substring(Url.LastIndexOf("."))

    If url_list.Contains(s) Then

         MsgBox("not a file")

    End If

     

     

    or you can do this

    For i As Integer = 0 To url_list.Count - 1

         If s.EndsWith(url_list(i).ToString) Then

             Exit For

             MsgBox("not a file")

         End If

    Next

     

     

    here is a good list of the extensions (not sure this is all of them

     

    Monday, September 1, 2008 12:12 AM
  • Thank you. There is only one problem, I hear that people will be able to register any domeain name soon. What about when people start having www.xxx.yalu? I think I could check if theres a "." in the last few characters and make sure there is a slash before that. "www.google.ca" does have a "." in the last few characters, but no slash before that. "www.freewebs.com/programble/programs.htm" has a "." in the last few letters, and a slash before that.
    Monday, September 1, 2008 7:35 PM

All replies

  • Or, if anyone has any other way (not Regular Expressions) of getting the addresses that point directly to a file, please post that aswell.
    Sunday, August 31, 2008 8:55 PM
  • i don't think this is regular expressions, but give it a try....
        Dim Url As String
            Url = "www.lol.com/lol.bmp"
            Dim count As New Integer
            Dim number As Integer = 0 - 5
            number += Url.Length
            For Each c As Char In Url.Substring(0)
                If count > number Then
                    If c = "." Then
                        MessageBox.Show("Points to a file")
                    End If
                End If

                count += 1
            Next
    Sunday, August 31, 2008 9:49 PM
  • Thank you, that seems to work.
    Sunday, August 31, 2008 10:19 PM
  • you could try something like this also

    Dim Url As String = "www.lol.com/lol.bmp"

    If Url.Substring(Url.Length - 4, 1) = "." Then

    'found . with 3 letter extension - should be a file???

    End If

     

     

    EDIT -

     

    Actually the more i think about it this could be broken easily because if the url is something like www.lol.com then it would think it was a file.  I'm not sure the best way to handle this one.  There are quite a few things to check to make sure the extension would not be .com or .net and others.  Maybe a list of file extensions to compare the end of the url to would be best.  ???

     

    Jeff

    Sunday, August 31, 2008 10:21 PM
  • you should use that its far more accurate
    Sunday, August 31, 2008 10:24 PM
  • That would only work on a three letter extension. What about .js, .cs, .vb, etc?
    Sunday, August 31, 2008 10:55 PM
  • You are right, it has too many flaws for several reasons.  It might really be better to have a list of possible file extensions and check that the url endswith an extension contained in the list.  Not sure of a better way to do it.  Though there probably is.

     

    Jeff

     

    Sunday, August 31, 2008 10:59 PM
  • How could I use a list? I want it to recognize every file extension, it's not just a set of extensions that I'm looking for.
    Sunday, August 31, 2008 11:32 PM
  • I wonder if it might be easier to rule out the possible domain types since there are fewer domain extensions.  If you create a generic list you can iterate through it and check if the url ends with anyting in the list. 

     

    Example

    Dim url_list As New List(Of String)

    url_list.Add(".com")

    url_list.Add(".net")

     

    Dim s As String = Url.Substring(Url.LastIndexOf("."))

    If url_list.Contains(s) Then

         MsgBox("not a file")

    End If

     

     

    or you can do this

    For i As Integer = 0 To url_list.Count - 1

         If s.EndsWith(url_list(i).ToString) Then

             Exit For

             MsgBox("not a file")

         End If

    Next

     

     

    here is a good list of the extensions (not sure this is all of them

     

    Monday, September 1, 2008 12:12 AM
  • Thank you. There is only one problem, I hear that people will be able to register any domeain name soon. What about when people start having www.xxx.yalu? I think I could check if theres a "." in the last few characters and make sure there is a slash before that. "www.google.ca" does have a "." in the last few characters, but no slash before that. "www.freewebs.com/programble/programs.htm" has a "." in the last few letters, and a slash before that.
    Monday, September 1, 2008 7:35 PM