Answered by:
Using Regular Expressions to Find File Addresses

Question
-
How would I go about using Regular Expressions to determine if a string points to a file? (A web address of a file)
An example would be:
http://www.freewebs.com/programble/ (Does not point to a file)
http://www.freewebs.com/programble/programs/fireworks/screenshots/screen1.bmp (Does point to a file)
I would want a function to return from the above list only the second address, because it points to a file.
Sunday, August 31, 2008 5:40 PM
Answers
-
i don't think this is regular expressions, but give it a try....
Dim Url As String
Url = "www.lol.com/lol.bmp"
Dim count As New Integer
Dim number As Integer = 0 - 5
number += Url.Length
For Each c As Char In Url.Substring(0)
If count > number Then
If c = "." Then
MessageBox.Show("Points to a file")
End If
End If
count += 1
NextSunday, August 31, 2008 9:49 PM -
I wonder if it might be easier to rule out the possible domain types since there are fewer domain extensions. If you create a generic list you can iterate through it and check if the url ends with anyting in the list.
Example
Dim
url_list As New List(Of String)url_list.Add(
".com")url_list.Add(
".net")Dim s As String = Url.Substring(Url.LastIndexOf("."))
If url_list.Contains(s) Then
MsgBox(
"not a file") End Ifor you can do this
For i As Integer = 0 To url_list.Count - 1 If s.EndsWith(url_list(i).ToString) Then Exit ForMsgBox(
"not a file") End If Nexthere is a good list of the extensions (not sure this is all of them
Generic Sponsored Infrastructure Deleted/retired Reserved Pseudo Proposed Locations Language and
nationalityTechnical Other Monday, September 1, 2008 12:12 AM -
Thank you. There is only one problem, I hear that people will be able to register any domeain name soon. What about when people start having www.xxx.yalu? I think I could check if theres a "." in the last few characters and make sure there is a slash before that. "www.google.ca" does have a "." in the last few characters, but no slash before that. "www.freewebs.com/programble/programs.htm" has a "." in the last few letters, and a slash before that.Monday, September 1, 2008 7:35 PM
All replies
-
Or, if anyone has any other way (not Regular Expressions) of getting the addresses that point directly to a file, please post that aswell.Sunday, August 31, 2008 8:55 PM
-
i don't think this is regular expressions, but give it a try....
Dim Url As String
Url = "www.lol.com/lol.bmp"
Dim count As New Integer
Dim number As Integer = 0 - 5
number += Url.Length
For Each c As Char In Url.Substring(0)
If count > number Then
If c = "." Then
MessageBox.Show("Points to a file")
End If
End If
count += 1
NextSunday, August 31, 2008 9:49 PM -
Thank you, that seems to work.Sunday, August 31, 2008 10:19 PM
-
you could try something like this also
Dim Url As String = "www.lol.com/lol.bmp" If Url.Substring(Url.Length - 4, 1) = "." Then 'found . with 3 letter extension - should be a file??? End IfEDIT -
Actually the more i think about it this could be broken easily because if the url is something like www.lol.com then it would think it was a file. I'm not sure the best way to handle this one. There are quite a few things to check to make sure the extension would not be .com or .net and others. Maybe a list of file extensions to compare the end of the url to would be best. ???
Jeff
Sunday, August 31, 2008 10:21 PM -
you should use that its far more accurateSunday, August 31, 2008 10:24 PM
-
That would only work on a three letter extension. What about .js, .cs, .vb, etc?Sunday, August 31, 2008 10:55 PM
-
You are right, it has too many flaws for several reasons. It might really be better to have a list of possible file extensions and check that the url endswith an extension contained in the list. Not sure of a better way to do it. Though there probably is.
Jeff
Sunday, August 31, 2008 10:59 PM -
How could I use a list? I want it to recognize every file extension, it's not just a set of extensions that I'm looking for.Sunday, August 31, 2008 11:32 PM
-
I wonder if it might be easier to rule out the possible domain types since there are fewer domain extensions. If you create a generic list you can iterate through it and check if the url ends with anyting in the list.
Example
Dim
url_list As New List(Of String)url_list.Add(
".com")url_list.Add(
".net")Dim s As String = Url.Substring(Url.LastIndexOf("."))
If url_list.Contains(s) Then
MsgBox(
"not a file") End Ifor you can do this
For i As Integer = 0 To url_list.Count - 1 If s.EndsWith(url_list(i).ToString) Then Exit ForMsgBox(
"not a file") End If Nexthere is a good list of the extensions (not sure this is all of them
Generic Sponsored Infrastructure Deleted/retired Reserved Pseudo Proposed Locations Language and
nationalityTechnical Other Monday, September 1, 2008 12:12 AM -
Thank you. There is only one problem, I hear that people will be able to register any domeain name soon. What about when people start having www.xxx.yalu? I think I could check if theres a "." in the last few characters and make sure there is a slash before that. "www.google.ca" does have a "." in the last few characters, but no slash before that. "www.freewebs.com/programble/programs.htm" has a "." in the last few letters, and a slash before that.Monday, September 1, 2008 7:35 PM