Does File size affect speed of directory listing

Jawab Does File size affect speed of directory listing

  • 07 Mei 2012 5:41
     
     

    I have an app that reads a directory to get the filenames in a directory

    for each File in Directory

    (process file)

    next file

    I want to know if the file size makes a difference to the time it takes to get the filenames

    eg if a folder contains 10,000 photos of 1 megapixel and another has a folder of 10,000 photos of 8 megapixels will it take the same time.

    Also is this the fastest way to get the listing

    Any help much appreciated

Semua Balasan

  • 07 Mei 2012 6:05
     
     Jawab

    No because you don't read the files but the directory.

    If you want to get the file info, then it is a different case


    Success
    Cor

    • Ditandai sebagai Jawaban oleh x38class 14 Mei 2012 7:01
    •  
  • 07 Mei 2012 6:05
     
     

    No.

    Yes.

    Possibly, depends how you get the listing.

    Bear in mind that if you get the listing, Windows might cache it so the second and subsequent gets are faster.


    Regards David R
    ---------------------------------------------------------------
    The great thing about Object Oriented code is that it can make small, simple problems look like large, complex ones.
    Object-oriented programming offers a sustainable way to write spaghetti code. - Paul Graham.
    Every program eventually becomes rococo, and then rubble. - Alan Perlis
    The only valid measurement of code quality: WTFs/minute.

  • 07 Mei 2012 7:05
     
     

    thanks for replying

    All I want is the filename, so it should take same time?

  • 07 Mei 2012 7:06
     
     

    As I said listing is

    for each File in Directory

    (process file)

    next file

    so I am not sure what you are trying to tell me

  • 07 Mei 2012 7:37
     
     

    No - file size makes no difference.

    Yes - it will take the same time. That follows from the first answer.

    Possibly - if you use the Directory class that's probably the fastest way, if you use some home brew method it could be faster or slower. You do not show VB code just pseudo code. I could assume you meant the Directory class, but then again you might not. :)


    Regards David R
    ---------------------------------------------------------------
    The great thing about Object Oriented code is that it can make small, simple problems look like large, complex ones.
    Object-oriented programming offers a sustainable way to write spaghetti code. - Paul Graham.
    Every program eventually becomes rococo, and then rubble. - Alan Perlis
    The only valid measurement of code quality: WTFs/minute.

  • 08 Mei 2012 4:34
     
     

    My code is:

    Dim gFile as String

    For each gFile in System.IO.GetFiles("C:\Xyz,"*.jpg",SearchOption.TopDirectoryOnly)

    lstFiles.items.add(gFile)

    Next

    Is this the class you are refering to or is there an example you can point me to

    Thanks

  • 08 Mei 2012 5:17
     
     

    Dim gFile as String

    For each gFile in System.IO.GetFiles("C:\Xyz,"*.jpg",SearchOption.TopDirectoryOnly)

    lstFiles.items.add(gFile)

    Next

    That code won't compile for me.  Which version of VB are you using?   You haven't nominated the type for gFile, so it is not clear whether you are returning a file, a filename or a fileinfo.

    The Directory class is described here:
    http://msdn.microsoft.com/en-us/library/system.io.directory.aspx

  • 08 Mei 2012 5:21
     
     

    There is no System.IO.GetFiles method.

  • 08 Mei 2012 6:03
     
      Memiliki Kode

    sorry missed out one word

    System.IO.Directory.GetFiles("C:\Xyz,"*.jpg",SearchOption.TopDirectoryOnly)

    Have found this info in link http://www.codeproject.com/Articles/38959/A-Faster-Directory-Enumerator

    • Directory.GetFiles method: ~43,860ms
    • DirectoryInfo.GetFiles method: ~44,000ms
    • FastDirectoryEnumerator.GetFiles method: ~55ms
    • FastDirectoryEnumerator.EnumerateFiles method: ~53ms

    That is roughly a 830x increase in performance, and more than 2 orders of magnitude! And, the gap only increases as the latency to the PC containing the files increases.

    Only problem is it is in C, any vb versions known?

    For Acamar, Net 2010, framework 2 my example code has a word missing

    this link http://msdn.microsoft.com/en-us/library/ms143316%28v=vs.80%29.aspx has

    'Declaration
    Public Shared Function GetFiles ( _
    	path As String, _
    	searchPattern As String, _
    	searchOption As SearchOption _
    ) As String()
    'Usage
    Dim path As String
    Dim searchPattern As String
    Dim searchOption As SearchOption
    Dim returnValue As String()
    
    returnValue = Directory.GetFiles(path, searchPattern, searchOption)

    How is it used in the application?, seems very similar to my code

    I thought I may have concluded this thread by now, have to go to hospital, so will be back in approx 48 hours hopefully

  • 08 Mei 2012 6:10
     
     

    For accurate timing, you have to run each method after a reboot of the computer.  Change the order in which you run the methods.  Are your relative times the same?  DirectoryInfo, will always be slow because it returns a FileInfo.

  • 14 Mei 2012 7:02
     
     
    Thanks to all who contributed, I still have no answer as to which is the fastest way to get a directory listing
  • 14 Mei 2012 7:19
     
     

    I have now found this link

    http://tom-shelton.net/index.php/2010/01/02/using-extension-methods-and-the-win32-api-to-efficiently-enumerate-the-file-system/

    However it is in C#, the owner quotes the following

    All of the current built in .NET functions - enumerate the entire directory before returning.  I have an article on my website on how to implement methods similar to what are being introduced in 4.0.  The caveat is that the code is in C# - and seriously, would be much more complicated to do in VB.NET because the lack of iterator support.  But, you could take the code and compile it to a dll for use in your vb project - if you don't fancy waiting for 4.0....


    Tom Shelton

    So my question now is: How do I get the code in C# compiled to a dll and how do I use it in my app

    Probably I am asking too much as I would need step by step instructions, all I know is there must be a better way to get a faster directory listing but having to change my distribution app to include Network 4 is a bit overkill (can you imagine the user installing an app on XP & finding out they must install Network 4, it really is excessive)

  • 14 Mei 2012 7:45
     
     

    I have now found this link

    http://tom-shelton.net/index.php/2010/01/02/using-extension-methods-and-the-win32-api-to-efficiently-enumerate-the-file-system/

    However it is in C#, the owner quotes the following

    All of the current built in .NET functions - enumerate the entire directory before returning.  I have an article on my website on how to implement methods similar to what are being introduced in 4.0.  The caveat is that the code is in C# - and seriously, would be much more complicated to do in VB.NET because the lack of iterator support.  But, you could take the code and compile it to a dll for use in your vb project - if you don't fancy waiting for 4.0....


    Tom Shelton

    So my question now is: How do I get the code in C# compiled to a dll and how do I use it in my app

    Probably I am asking too much as I would need step by step instructions, all I know is there must be a better way to get a faster directory listing but having to change my distribution app to include Network 4 is a bit overkill (can you imagine the user installing an app on XP & finding out they must install Network 4, it really is excessive)

    If you want all the files in a directory and its subdirectories, there is very little difference in the times for the various methods. The time to get the files before they are cached is orders of magnitude greater than the time to get them after they are cached.   What exactly do want? 

  • 15 Mei 2012 4:32
     
     

    I want a listing of file names from a selected directory

    Currently on my pc it takes 45 seconds to load 10,000 file names and parse the file name for a specific field

    My code is:

    Dim gFile as String

    For each gFile in System.IO.Directory.GetFiles("C:\Xyz,"*.jpg",SearchOption.TopDirectoryOnly)

    ParseFile(gFile)

    Next

    On a Cd it takes 8 minutes, as I have no idea what my users have in the way of memory/processor/drive type or revolution speed I want the time taken to be an absolute minimum so that confidence in using my app is not an issue for the user to sit & wait, they may have 35,000 files, what then?

  • 15 Mei 2012 4:37
     
     Saran Jawaban

    I want a listing of file names from a selected directory

    Currently on my pc it takes 45 seconds to load 10,000 file names and parse the file name for a specific field

    My code is:

    Dim gFile as String

    For each gFile in System.IO.Directory.GetFiles("C:\Xyz,"*.jpg",SearchOption.TopDirectoryOnly)

    ParseFile(gFile)

    Next

    On a Cd it takes 8 minutes, as I have no idea what my users have in the way of memory/processor/drive type or revolution speed I want the time taken to be an absolute minimum so that confidence in using my app is not an issue for the user to sit & wait, they may have 35,000 files, what then?

    The time consuming method is most likely ParseFile. GetFiles for a 35,000 file single directory should take less than 10 seconds.  Why is the user waiting?


  • 15 Mei 2012 5:40
     
     

    Thanks John for your suggestion, obviously I have not seen my routine as being the problem. only expecting the result of the directory search to be the only cause.

    I will have to put a progress bar in my app to highlight progress of the parsing.

    Thanks for taking the time to make a contribution to myself & others on this site

  • 15 Mei 2012 6:57
     
     

    By the Way....file size does not effect the speed of a listing. The NTFS file system has headers in the Master file directory which usually has a block size of 1024 bytes.

    Renee


    "MODERN PROGRAMMING is deficient in elementary ways BECAUSE of problems INTRODUCED by MODERN PROGRAMMING." Me

  • 15 Mei 2012 14:21
     
     

    Thanks John for your suggestion, obviously I have not seen my routine as being the problem. only expecting the result of the directory search to be the only cause.

    I will have to put a progress bar in my app to highlight progress of the parsing.

    Thanks for taking the time to make a contribution to myself & others on this site

    I constructed a directory containging 35000 jpg files (64 x 64 portions of screen shots).  GetFiles took 100 milliseconds after a restart.
  • 16 Mei 2012 5:31
     
     

    Thanks to Renee & John for further comments

    My searches on the web for fast solutions has found many people interested in this topic

    I hope the discussion on this thread is of some help to others

  • 08 Juni 2012 17:59
     
     

    There is nothing to discuss. Filesize in no way effects the amount to get a listing.

    Renee


    "MODERN PROGRAMMING is deficient in elementary ways BECAUSE of problems INTRODUCED by MODERN PROGRAMMING." Me


  • 08 Juni 2012 18:22
     
     

    The reason it takes so long is that GetFiles reads *all* the names before it returns.  But if you use the EnumerateFiles or EnumerateFileSystemEntries methods, it returns before getting all the names so you can start processing immediately.  From the docs:

    "The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient."

    http://msdn.microsoft.com/en-us/library/dd383458.aspx#Y342

  • 08 Juni 2012 19:28
     
     

    The reason it takes so long is that GetFiles reads *all* the names before it returns.  But if you use the EnumerateFiles or EnumerateFileSystemEntries methods, it returns before getting all the names so you can start processing immediately.  From the docs:

    "The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient."

    http://msdn.microsoft.com/en-us/library/dd383458.aspx#Y342

    This only applies to cached info.  The first time EnumerateFiles or GetFiles is run, each will be IO limited and take approximately the same time.  The quickest way to do anything with IO is to do it the second time first.