none
Can I storing over 100,000 files in a single folder without problems? RRS feed

  • Question

  • I have a database with about 6 "file" tables. These files are stored in a SQL Server column of type VARBINARY(MAX). To save or retrieve files the code uses a stream to get binary array either from a file to store in column, or from the column to write to file.

    I need to remove these files from the database for many reasons. So I am wanting to store them in the file system on a Window 2016 server running IIS. The files will then be saved and retrieved via a web browser by the users so they can be accessed anywhere from a browser.

    The file location required to retrieve the files will be stored in a table instead of the varbinary data. 

    Am I going to run into any issues doing this? Either with C# or the windows file system itself?

    Monday, January 14, 2019 7:40 PM

Answers

  • NTFS can handle it but you're going to have issues outside the file system itself. Given that many files it is going to be slow to enumerate. So, for example, if you happen to open that folder from Windows Explorer it'll be noticeably slow. If you call Directory.EnumerateFiles then the same thing. 

    You mention that files are associated by customer. What happens if 2 customers have the same filenames? What if you a customer reports they aren't seeing all their files? Debugging through that many files is going to be ugly.

    In general the preference is to use buckets to store files. You can break up the files however you want but buckets help manage the issues involving that many files and doesn't really negatively impact the performance or behavior of your app. For example my company deals with files heavily. We have 6TB of archived documents right now and that grows in the GB monthly. We don't store all the files in a single directory because on occasion we may need to go examine the files. We separate the files by customer and then by category. While customers can have 100s of files the performance (even in the UI) is good and it has no impact on what the end user sees. We use an API to expose the virtual file structure to users while behind the scenes we store the files into a (customer-based) flat folder structure. If we later decided to break this down further it would simply be a change in how we write out the files.


    Michael Taylor http://www.michaeltaylorp3.net

    • Marked as answer by Love2Code2Much Wednesday, January 16, 2019 4:38 PM
    Tuesday, January 15, 2019 8:49 PM
    Moderator

All replies

  • I think it is not problem when you know full name. If not and you search file by mask over Directory.GetFiles(…) it can take a lot of time to find file which you want. 
    Monday, January 14, 2019 8:34 PM
  • Have you considered using the FILESTREAM feature of SQL Server? It stores the files physically on disk folders, but gives you access to the content of the files through a binary field in the table, so they can be indexed and you get the manageability of the database. And it comes essentially for free, since it is included in all editions of SQL Server, even in the free Express Edition.
    Monday, January 14, 2019 8:54 PM
    Moderator
  • Yes, we have considered filestream but would prefer to store the files in the file system.
    Monday, January 14, 2019 9:58 PM
  • Hi Love2Code2Much,

    If you deploy the file folder into IIS, you could only save related file URL in IIS into database, and we could retrieve the file via web browser. 

    Best regards,

    Zhanglong


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, January 15, 2019 6:35 AM
    Moderator
  • Hello Zhaglong Wu,

    That is the plan. I just want to make sure that I will not have any issues if I put 100,000 or more files in a single folder. Or do we have to write a routine to store the files in folders of 10,000 or some other number?

    We are not going to access these files using File Explorer or a public FTP client. But I do seem to remember that both File Explorer and FTP clients have issues listing and indexing files over a particular number. So I am wondering how far that issue reaches?

    Will I have problems using C# or an FTP library on a client app to save/rename/delete files if I store over 100,000 files in a single folder. I planned to store the file locations in a table so there will never be a reason to scan the directory to list files. The table storing the file location will have a customer id as well, so if we ever list the files it will just be a list from the table by one customer id at a time. The most files we have on one customer is around 20, and the average is about 5.

    But really, that probably doesn't matter. I just wanted to say we won't need to scan the directory or anything. Should I expect any problems and just store the files in groups of some smaller number (such as 10,000) in a bunch of folders?


    Tuesday, January 15, 2019 7:53 PM
  • NTFS can handle it but you're going to have issues outside the file system itself. Given that many files it is going to be slow to enumerate. So, for example, if you happen to open that folder from Windows Explorer it'll be noticeably slow. If you call Directory.EnumerateFiles then the same thing. 

    You mention that files are associated by customer. What happens if 2 customers have the same filenames? What if you a customer reports they aren't seeing all their files? Debugging through that many files is going to be ugly.

    In general the preference is to use buckets to store files. You can break up the files however you want but buckets help manage the issues involving that many files and doesn't really negatively impact the performance or behavior of your app. For example my company deals with files heavily. We have 6TB of archived documents right now and that grows in the GB monthly. We don't store all the files in a single directory because on occasion we may need to go examine the files. We separate the files by customer and then by category. While customers can have 100s of files the performance (even in the UI) is good and it has no impact on what the end user sees. We use an API to expose the virtual file structure to users while behind the scenes we store the files into a (customer-based) flat folder structure. If we later decided to break this down further it would simply be a change in how we write out the files.


    Michael Taylor http://www.michaeltaylorp3.net

    • Marked as answer by Love2Code2Much Wednesday, January 16, 2019 4:38 PM
    Tuesday, January 15, 2019 8:49 PM
    Moderator
  • You mention that files are associated by customer. What happens if 2 customers have the same filenames?

    If I stored them all in the same folder, I was going to store the file on disk giving it a random hash name. Then store that name in the database along with the actual file name the customer will see. When listing or allowing a customer to download the file, I would present them with the correct file name.

    We will store them in buckets then. Its better than running into a problem later.

    Thanks everyone!

    Wednesday, January 16, 2019 4:38 PM