none
Large scale document storage - best practices? RRS feed

  • Question

  • I'm working on re-designing an application that stores a large number of documents.  Projection is somewhere in the 1-2 million additional per year range, for mostly small PDFs (100k or so) and retention is up to 5 years or more.  The documents are currently stored and managed within a Sql Server db - it's functional, but would not have been my choice.  The db backups are difficult at best to manage, nightly jobs are resource intensive, and load is added to a database cluster which (while not impossible) can be difficult and expensive to scale.  However, I have not worked with document storage on a file server on that scale and don't know the limits of scalability there.  I know there are reachable limits on the number of documents and folders where performance begins to degrade, but with a well-organized structure can a file server handle 10 million documents without performance issues?  What is the best practice here?  Management is generally against research into tools such as Documentum, but given the right studies and metrics, a case can be proven if necessary. 

     

    Any thoughts are appreciated.

     

    Regards,

    Matt

    Thursday, September 11, 2008 8:00 PM

Answers

  • Matt,

    The volume you are talking can be best handled by ECM tools like Documentum and FileNET.  Having spent good time of my career on FileNET tools, I can say this cannot be handled by a database BLOB object or folders in a filesystem.  ECM tools are built to optimize the storage and retrieval of large volume of documents like the your case and you can not supplement the document retention policies that are supplied by these tools. 

     

    If you want a cheaper alternative, take a look at Oracle Content Management.  Cheaper compared to Documentum and FileNET.  Look at overview guide, it paints a brief overview of using a specialized tool. 

     

    Finally, if you want to make a case of buying a tool Vs building your own, just call the vendors, they would provide all the information you need.

     

    Hope this helps your quest.

     

    { Gaja; }

    Thursday, September 11, 2008 8:39 PM

All replies

  • Matt,

    The volume you are talking can be best handled by ECM tools like Documentum and FileNET.  Having spent good time of my career on FileNET tools, I can say this cannot be handled by a database BLOB object or folders in a filesystem.  ECM tools are built to optimize the storage and retrieval of large volume of documents like the your case and you can not supplement the document retention policies that are supplied by these tools. 

     

    If you want a cheaper alternative, take a look at Oracle Content Management.  Cheaper compared to Documentum and FileNET.  Look at overview guide, it paints a brief overview of using a specialized tool. 

     

    Finally, if you want to make a case of buying a tool Vs building your own, just call the vendors, they would provide all the information you need.

     

    Hope this helps your quest.

     

    { Gaja; }

    Thursday, September 11, 2008 8:39 PM
  • Thanks, Gaja - I was afraid that would be the answer.

     

    -m

     

    Friday, September 26, 2008 5:15 PM
  • Hi,
    Storing this much of documents into SQLServer as a BLOB is not advisable. It will hinder the performance of sqlserver and it becomes nightmare while taking the backups since it is growing vigorously. I hope the best solution will be

    1. Using any EDM (aka DMS  - document management system) like FileNet or OmniDocs (http://www.newgensoft.com/omnidocs.asp). Lot of EDMs available in the market based on your financial budget.
    2. If you are using SQL Server 2008, then you can try FILESTREAM which store directly in to file system. (since it is new feature, do some R&D on this).But SQL Server 2008 should be used.
    3. Cheapest solution is,
    Writing a own library which will archive the documents in to windows file system, like folder structure and storing the file path into SQL Server database (not only sqlserver, any Database or XML if it is small file list).
    But the disadvantage over here is
      • Security of the document. If your pdf contains some important info, anyone can see pdf since it is not encrypted
      • While taking the backup you have to consider SQL Server backup and Folders backup which contains documents.


    Actually i am writing a freeware DMS includign encrypted archive and backup feature,  which may take sometime to release as i don't have time. So in future you may use my DMS framework , if you wish.


    Thanks,
    Suresh M

    Pls mark if answer solves your problem.
    Tuesday, September 30, 2008 5:25 AM
  • this was very useful

    Wednesday, October 1, 2008 4:52 AM
  • I would ask a storage vendor such as EMC. Storing the volume of files mention on share is not the way to go. A product dedicated to document management is best to use.

    Monday, October 6, 2008 11:20 AM