none
Is it possible to retrieve a directory listing from a URL using VB?

    Question

  • I have a desire to create a program where the user can query a website (in the format www.somedownloads.net/downloads) and see what files are stored there in a VB application. My question is more a conceptual question, because after some research, I'm beginning to think VB may not have the right tools to do this. I've stumbled across plenty of examples for listing files from local directories and file servers, but nothing definitive about just a plain URL. It seems to me like it wouldn't be possible, since there's no way to GetFiles() or convert a URI into a DirectoryInfo. Is there something I'm overlooking, or am I better off exploring other solutions?
    Wednesday, April 19, 2017 6:52 AM

Answers

  • Rajada,

    There are no (not illegal) tools to to this with a plain HTML Url (and that is what you show), no tools means also not with VB or whatever program language.

    What you ask is possible with an FTP website or the website should have WebDAV features.

    Therefore first investigate what kind of website you're talking about. 


    Success
    Cor

    Wednesday, April 19, 2017 8:39 AM
  • This is a function of the website configuration.  It has nothing to do with the remote host (e.g. your computer).

    The ability to list folder contents over HTTP is a setting in the webserver which is almost always turned off for security reasons.  Occasionally you may still find an ancient website which is just a simple static HTML site whose navigation is provided by directory listings.

    A website's directory may contain all kinds of supporting files which are not meant to be exposed to the end user.  Allowing directory viewing would open the site up to serious security risks if the directory contained anything other than raw HTML and images.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Wednesday, April 19, 2017 12:03 PM
    Moderator
  • Yes, it is my site, and it is in this indexed format.

    http://downloads.nerfarena.net/

    SFTP presents too many issues for my webhost, so I was hoping to just, be able to get a listing of that URL somehow.

    OK, so that website has the List Directory option turned on.  You can view and navigate the folder structure of the website via auto generated HTML pages.  All you have to do is parse the HTML response text to extract the folder and file URLs from the links.

    Something like the following example should work:

    'Simple container class to hold result items
    Public Class RemoteFileInfo
        Public Property Url As String
        Public Property FileName As String
        Public Property FileSize As String
        Public Property LastModified As String
        Public Property Description As String
    
        Public Overrides Function ToString() As String
            Return $"{FileName} [{Url}], {LastModified}, {FileSize}, ""{Description}"""
        End Function
    End Class
    
    'Function to get list of items from URL
        Public Async Function GetRemoteFileInfos(remoteAddress As String) As Task(Of IEnumerable(Of RemoteFileInfo))
            Dim results As New List(Of RemoteFileInfo)
            Using client As New Net.Http.HttpClient
                Dim htmlText As String = Await client.GetStringAsync(remoteAddress)
                Dim lines() As String = htmlText.Split(ControlChars.Lf)
                Dim index As Integer = -1
                Dim line As String = String.Empty
                Do Until line.StartsWith("<a")
                    index += 1
                    line = lines(index).Trim
                Loop
    
                Do While index < lines.Length AndAlso line.StartsWith("<a")
                    line = lines(index).Trim
                    Dim sepIndex As Integer = line.IndexOf("   ")
                    If sepIndex > -1 Then
                        Dim currentInfo As New RemoteFileInfo
    
                        Dim a As XElement = XElement.Parse(line.Substring(0, sepIndex))
                        currentInfo = New RemoteFileInfo
                        currentInfo.Url = IO.Path.Combine(remoteAddress, a.@href)
                        currentInfo.FileName = a.Value
    
                        Dim parts() As String = line.Substring(sepIndex).TrimStart.Split({"  "}, StringSplitOptions.RemoveEmptyEntries)
                        currentInfo.LastModified = parts(0).Trim
                        If parts.Length > 1 Then currentInfo.FileSize = parts(1).Trim
                        If parts.Length > 2 Then currentInfo.Description = parts(2).Trim
                        results.Add(currentInfo)
                    End If
                    index += 1
                Loop
            End Using
            Return results.ToArray
        End Function
    
    'Example Usage:
    Private Async Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        Dim fileList = Await GetRemoteFileInfos("http://downloads.nerfarena.net/")
        For Each entry In fileList
            ListBox1.Items.Add(entry)
        Next
    End Sub
    


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    • Marked as answer by Rajada_NAB Tuesday, May 02, 2017 5:34 PM
    Tuesday, May 02, 2017 3:41 PM
    Moderator

All replies

  • Rajada,

    There are no (not illegal) tools to to this with a plain HTML Url (and that is what you show), no tools means also not with VB or whatever program language.

    What you ask is possible with an FTP website or the website should have WebDAV features.

    Therefore first investigate what kind of website you're talking about. 


    Success
    Cor

    Wednesday, April 19, 2017 8:39 AM
  • I think I'm explaining this poorly. I am trying to make something that can list out the contents of a folder on a server, then choose which to download, much like a lightweight syncing program.
    Wednesday, April 19, 2017 9:46 AM
  • I think I'm explaining this poorly. I am trying to make something that can list out the contents of a folder on a server, then choose which to download, much like a lightweight syncing program.

    No you explain it correct.

    However, that can simply not been done if the website is not FTP or WebDAV.

    All websites starting with WWW are not FTP.

    Try first to find a freeware tool which can do it, and if you have found it, tell it here, than others like me can also use that. 

    :-)


    Success
    Cor

    Wednesday, April 19, 2017 10:39 AM
  • This is a function of the website configuration.  It has nothing to do with the remote host (e.g. your computer).

    The ability to list folder contents over HTTP is a setting in the webserver which is almost always turned off for security reasons.  Occasionally you may still find an ancient website which is just a simple static HTML site whose navigation is provided by directory listings.

    A website's directory may contain all kinds of supporting files which are not meant to be exposed to the end user.  Allowing directory viewing would open the site up to serious security risks if the directory contained anything other than raw HTML and images.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Wednesday, April 19, 2017 12:03 PM
    Moderator
  • Alternatively, say I can set up the folder for anonymous FTP, can I list the contents using VB's FTP functionality, or is it too limited to do what I want to do?
    Wednesday, April 19, 2017 4:47 PM
  • Reed,

    Do you have a sample of such an old Website, I've never seen them, know them or whatever, so I'm curious about that. 

    Not one of course with the let say a little bit new feature WebDav

    https://en.wikipedia.org/wiki/WebDAV


    Success
    Cor

    Wednesday, April 19, 2017 5:00 PM
  • No you cannot set a Http website to FTP for retrieving from the client. You really have to do it from behind the server.

    http://www.htmlgoodies.com/beyond/reference/article.php/3472821/So-You-Want-An-FTP-Directory-Huh.htm


    Success
    Cor

    Wednesday, April 19, 2017 5:05 PM
  • My webhost provides the means to set up a domain for anonymous FTP. I'm very well aware I can't do that from within a VB program. I thought it was obvious that I meant setting it up from within my webhost.

    With that in mind, would you recommend using VB's FTP functionality? Is the support for concurrent users decent? Can I list the directory like I wanted to?
    Wednesday, April 19, 2017 5:16 PM
  • If the website is yours you can simply use the .Net FTP protocol

    This is a start page to more information about that on MSDN

    https://msdn.microsoft.com/en-us/library/ms229718(v=vs.110).aspx

    FTP supports retrieving of folders (directories)


    Success
    Cor

    Wednesday, April 19, 2017 5:20 PM
  • Probably this is as well usefull for you handling FTP websites with VB

    https://msdn.microsoft.com/en-us/library/34322t8f(VS.100).aspx


    Success
    Cor

    Wednesday, April 19, 2017 5:33 PM
  • If the website is yours you can simply use the .Net FTP protocol

    This is a start page to more information about that on MSDN

    https://msdn.microsoft.com/en-us/library/ms229718(v=vs.110).aspx

    FTP supports retrieving of folders (directories)


    Success
    Cor


    I'd like to be able to distribute the program without setting up FTP accounts for everyone, so as long as it supports anonymous connections, I'll look into this.
    Wednesday, April 19, 2017 5:59 PM
  • Probably this is as well usefull for you handling FTP websites with VB

    https://msdn.microsoft.com/en-us/library/34322t8f(VS.100).aspx


    Success
    Cor

    Editing the site isn't necessary, I'd like basically to give some users list and download only permissions. The program won't need to support anything more besides the ability to select the files the user wishes to download. I'll check out both of these resources and determine if they fit my needs.
    Wednesday, April 19, 2017 6:04 PM
  • Reed,

    Do you have a sample of such an old Website, I've never seen them, know them or whatever, so I'm curious about that. 

    Not one of course with the let say a little bit new feature WebDav

    https://en.wikipedia.org/wiki/WebDAV


    Success
    Cor

    The IETF still maintains the core collection of RFC documents this way:

    https://www.ietf.org/rfc


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Sunday, April 30, 2017 12:46 PM
    Moderator

  • The IETF still maintains the core collection of RFC documents this way:

    https://www.ietf.org/rfc


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    In my perception is that not something that retrieves a file directory. 

    However, if the OP makes a webservice than he can create of course this as a web-document and retrieve that but that is not by using an URL. (In the way I guess the OP means).

    But the OP could solve his problem with that if he did.


    Success
    Cor

    Sunday, April 30, 2017 5:51 PM
  • I'd like to be able to distribute the program without setting up FTP accounts for everyone, so as long as it supports anonymous connections, I'll look into this.

    Why are you considering distributing a program?  There are many excellent free FTP clients available, and some subset of FTP is built into most all browsers that people already have on their own systems.  Distributing another FTP application seems like a waste of time.

    Sunday, April 30, 2017 10:55 PM
  • Editing the site isn't necessary, I'd like basically to give some users list and download only permissions. The program won't need to support anything more besides the ability to select the files the user wishes to download. I'll check out both of these resources and determine if they fit my needs.

    Rajada,

    Is it your site? Everyone here is assuming you mean "just any site" but if it's yours, that's a very different thing.

    If so, you can do a lot more with it since you're in the driver's seat. Have a look around and you'll see many ways to, for example, list the files (like this one) using dotNet.

    If it's yours, you know what's where and  that gives you the opportunity to do it differently - dynamically.

    I'll explain more if you're interested but it's all a moot point if it's not your site.

    Give it thought.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Sunday, April 30, 2017 11:32 PM
  • Yes, it is my site, and it is in this indexed format.

    http://downloads.nerfarena.net/

    SFTP presents too many issues for my webhost, so I was hoping to just, be able to get a listing of that URL somehow.
    • Edited by Rajada_NAB Tuesday, May 02, 2017 3:51 AM
    Tuesday, May 02, 2017 3:50 AM
  • Yes, it is my site, and it is in this indexed format.

    http://downloads.nerfarena.net/

    SFTP presents too many issues for my webhost, so I was hoping to just, be able to get a listing of that URL somehow.

    I'm assuming that you're talking to me.

    What you show there is about all you'll get unless you do something on your own. If you "do it on your own" with FTP, for example, then you'd have to maintain it - it wouldn't be automatic.

    Have you contacted your webhost to see if they might offer a solution?


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Tuesday, May 02, 2017 9:55 AM
  • Honestly, I'm not sure what I'd ask my webhost to do at this point. When I started this project, I thought for sure it couldn't be that hard to look at a plain html index and know the text content, then convert that to a files listing.

    What I didn't want to do is have to create a text file with all the names of the files, download it, and parse it down because then I have to maintain something where I could make a mistake. Since I know I can do it that way, downloading each file individually via normal html, it seems crazy to me that there's no way to look at that kind of metadata on a website. I mean, how does my browser list every file in an index on that page? It certainly doesn't have a text file, or use FTP, and it clearly updates when new files are placed in the appropriate folder. Admittedly, as a bit of a beginner when it comes to web stuff, I'm sitting here, pointing at the site and wondering why I can't emulate that.

    Again, I don't mind using FTP, but I have recommendations against it from my webhost due to the poor security it offers.


    • Edited by Rajada_NAB Tuesday, May 02, 2017 2:14 PM Typo
    Tuesday, May 02, 2017 2:13 PM
  • I mean, how does my browser list every file in an index on that page?

    This is the second time I tried sending this so maybe it'll go through this time...

    The open browsing that you're talking about is controlled with a configuration file that's in the main directory (.htaccess on Apache).

    That's what Reed cautioned about the other day.

    *****

    I wish I had a great answer here but I don't.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Tuesday, May 02, 2017 3:26 PM
  • Yes, it is my site, and it is in this indexed format.

    http://downloads.nerfarena.net/

    SFTP presents too many issues for my webhost, so I was hoping to just, be able to get a listing of that URL somehow.

    OK, so that website has the List Directory option turned on.  You can view and navigate the folder structure of the website via auto generated HTML pages.  All you have to do is parse the HTML response text to extract the folder and file URLs from the links.

    Something like the following example should work:

    'Simple container class to hold result items
    Public Class RemoteFileInfo
        Public Property Url As String
        Public Property FileName As String
        Public Property FileSize As String
        Public Property LastModified As String
        Public Property Description As String
    
        Public Overrides Function ToString() As String
            Return $"{FileName} [{Url}], {LastModified}, {FileSize}, ""{Description}"""
        End Function
    End Class
    
    'Function to get list of items from URL
        Public Async Function GetRemoteFileInfos(remoteAddress As String) As Task(Of IEnumerable(Of RemoteFileInfo))
            Dim results As New List(Of RemoteFileInfo)
            Using client As New Net.Http.HttpClient
                Dim htmlText As String = Await client.GetStringAsync(remoteAddress)
                Dim lines() As String = htmlText.Split(ControlChars.Lf)
                Dim index As Integer = -1
                Dim line As String = String.Empty
                Do Until line.StartsWith("<a")
                    index += 1
                    line = lines(index).Trim
                Loop
    
                Do While index < lines.Length AndAlso line.StartsWith("<a")
                    line = lines(index).Trim
                    Dim sepIndex As Integer = line.IndexOf("   ")
                    If sepIndex > -1 Then
                        Dim currentInfo As New RemoteFileInfo
    
                        Dim a As XElement = XElement.Parse(line.Substring(0, sepIndex))
                        currentInfo = New RemoteFileInfo
                        currentInfo.Url = IO.Path.Combine(remoteAddress, a.@href)
                        currentInfo.FileName = a.Value
    
                        Dim parts() As String = line.Substring(sepIndex).TrimStart.Split({"  "}, StringSplitOptions.RemoveEmptyEntries)
                        currentInfo.LastModified = parts(0).Trim
                        If parts.Length > 1 Then currentInfo.FileSize = parts(1).Trim
                        If parts.Length > 2 Then currentInfo.Description = parts(2).Trim
                        results.Add(currentInfo)
                    End If
                    index += 1
                Loop
            End Using
            Return results.ToArray
        End Function
    
    'Example Usage:
    Private Async Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        Dim fileList = Await GetRemoteFileInfos("http://downloads.nerfarena.net/")
        For Each entry In fileList
            ListBox1.Items.Add(entry)
        Next
    End Sub
    


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    • Marked as answer by Rajada_NAB Tuesday, May 02, 2017 5:34 PM
    Tuesday, May 02, 2017 3:41 PM
    Moderator
  • Reed,

    That's still open browsing though and it's not automatic.

    If he's willing to maintain it himself, then a third-party library like Chilkat (it's what I use but it's not free) has methods like this one:

    https://www.example-code.com/vbnet/ftp_dirTreeXml.asp

    It will hand you back the directory structure as XML. It works pretty well and he can then close the open browsing.


    "A problem well stated is a problem half solved.” - Charles F. Kettering


    Tuesday, May 02, 2017 4:25 PM
  • This is actually exactly what I was looking for. Once I know what's at the index, I'll have my program and the user handle what they want to get. That particular URL I provided isn't really what the program will look at, but after testing your code (which by the way, I'm very impressed that you took the time to write code for me, that was very above and beyond) it does report back the files into my listbox. This was what I thought I could do, but wasn't sure how to put it into words.

    Perhaps I wasn't 100% clear on the fact that I'd be maintaining it anyway, sort of like a GitHub project. I update a file, FTP it over, and users can see that it has changed, been updated, and allow it to update on their local machine. Sure, it's not the most efficient solution, but it does afford me the customization and idiot-proof user-friendliness I need.
    Tuesday, May 02, 2017 5:40 PM
  • Reed,

    That's still open browsing though and it's not automatic.

    If he's willing to maintain it himself, then a third-party library like Chilkat (it's what I use but it's not free) has methods like this one:

    https://www.example-code.com/vbnet/ftp_dirTreeXml.asp

    It will hand you back the directory structure as XML. It works pretty well and he can then close the open browsing.


    "A problem well stated is a problem half solved.” - Charles F. Kettering


    Yes, it is, but apparently that doesn't matter.  It looks like it is like the IETF site where the sole purpose of the virtual web directory is to host files.  In that scenario its probably OK to allow the directory listing (assuming the underlying folders are locked down security-wise).

    Since he owns the website, the best solution would probably be to write a web page of some sort which outputs the directory listing in the desired format when called.  More of an "API" than a "web page", but truly, it could be written in anything the local webserver can process.  This would facilitate parsing the listed data and would allow the site to be configured more securely.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Tuesday, May 02, 2017 6:26 PM
    Moderator

  • Yes, it is, but apparently that doesn't matter.  It looks like it is like the IETF site where the sole purpose of the virtual web directory is to host files.  In that scenario its probably OK to allow the directory listing (assuming the underlying folders are locked down security-wise).

    Since he owns the website, the best solution would probably be to write a web page of some sort which outputs the directory listing in the desired format when called.  More of an "API" than a "web page", but truly, it could be written in anything the local webserver can process.  This would facilitate parsing the listed data and would allow the site to be configured more securely.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    I'd go for the API, but better yet, I think that the OP might want to reconsider and look for an FTP site.

    It's not low-cost but it sounds like that would be more appropriate.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Tuesday, May 02, 2017 7:40 PM

  • Since he owns the website, 

    Yes but what kind, to run for instance a Micrsoft Webservice you need at least IIS on that one. If it is a flat file structure which is often supported than there is few to do. 

    https://social.technet.microsoft.com/wiki/contents/articles/34351.configure-iis-server-on-azure-virtual-machine-windows-server.aspx


    Success
    Cor

    Tuesday, May 02, 2017 7:51 PM
  • A small detail, but can I modify this to iterate subdirectories? Or must I call this for each subdirectory I want to explore?
    Wednesday, May 03, 2017 12:05 AM