Content Management Application - Architecture Question! RRS feed

  • Question

  • Content Management Application!

    I'm trying to build a win form application with C# to manage over 1000 files stored on my web server. The files are mostly HTML and some DOCX, EXCEL, PDF. In order to manage these files, user will need to enter the file name, then I'll display it on the .net web browser component. Currently, I convert all of these files into HTML and PDF format, then use web browser component to display these files.

    With over 1000 files, there are over 10K images attached to these HTML pages.  The total size could reach around 10GB including images. The folder where I store these files become very bulky and unmanageble.

    If I'd like to have these files in a more manageable structure, I have these obstacles:
     - Maybe convert them a different file format such as *.mht to avoid managing images?
     - Should I bring them to a common format such as *.pdf and HTML as I've done earlier?
     - Storing these html file into SQL database seems to be a good option for me until I realized that SQL cannot store such string as PI (
    π) and mathematical formulas

    What is your thought?

    Thank you!
    Tuesday, March 17, 2009 10:20 PM

All replies

  • If the HTML files are not required to stay in that format when users are looking at them, then I'd convert to PDF.  Now converting to PDF isn't really going to save you that much room but it will allow you to avoid managing images.

    Another bonus of PDF is they will look and print the same for everybody.  If the files are basically static and not going to change and are for used in a knowledge base type of application I'd really look into PDF's.  There are lots of options for converting to PDF's.  Batch conversions, .Net controls etc.

    I think you'll find the PDF route simpler to manage and you'll have plenty of support.

    Wednesday, March 18, 2009 2:35 PM
  • Since most of files are in HTML, converting them to PDFs might not be a good option. Most files are HTML with about 2-5% of them has images and math formulas.
    1. Should we keep HTML as they are?
    2. Majority of them are text based files. Should I convert all *.docx to html?

    Wednesday, March 18, 2009 3:59 PM
  • If nobody is really going to be printing then yes, you can stay with HTML.  If printing and portability are concerns, I'd still go for PDF.  I'd especially move docx to PDF not HTML.  With HTML you have the headace of browser versions etc.

    If it were my project I'd to convert to PDF.  Again, if the content of these files needs to change then by all means you should leave them as HTML and docx files.  If this is for archiving and information retrieval I'd go the PDF route.

    Have you looked into Microsoft Sharepoint Services?

    • Edited by dr_linux Wednesday, March 18, 2009 4:12 PM sharepoint addition
    Wednesday, March 18, 2009 4:11 PM

  • The files are stored on our centralized server. Client will need a win form to access the files. We have 2 portals:

    - Admin portal (winform) for us to edit.
    - Client portal (winform) without edit. Only browse the files

    To convert everything to PDF would require us to maintain 2 versions of the file: docx/html and PDFs. I was thinking of html (static) format as I could edit and display to client machine easier and probably faster....

    Any other options?
    Wednesday, March 18, 2009 4:49 PM
  • Yes, for editing by all means keep them docx and html.  Current PDF files you can't edit anyway so those don't matter.

    For content managment have you looking into Sharepoint?  The sharepoint option really does provide what you are trying to do and more.  And you can program against sharepoint.  Sharepoint Services is free and works on Windows Server 2003.

    Sharepoint is at least worth a look.  You can keep track of file edits, track who does what manage permissions, and so much more.

    Wednesday, March 18, 2009 4:56 PM