locked
Database design Arhitecture DMS RRS feed

  • Question

  • Hi

    I have a DMS application. My database has a Table where i keep track of all the files.

    The problem is that at some point in time this db Table will get probably very huge, and for the basic operations like returning a list of files from a folder will take some time.

     (Select * from Files order by Date DESC) this will most probably be my retrive rows query. For a low number of rows this works almost instantly but for a large set like (500.000 rows) even a simply select will become problematic.

    My ideea was to make 2 tables.
    • 1 which stores files recently used, uploaded, modified for the last month.
    • 1 which holds older records (Maybe an archiving system).
    So now for fast operations like simple display i will use the "recently used tables" and for advanced retrieval, like a 3 year old document, or a global search, or display all files i will use both of them

    What do you think? How would you do it?

    Very much appreciated!
    Me http://www.zapacila.com
    Thursday, May 28, 2009 7:29 AM

Answers

  • Hello zapacila89,

    if your where-condition is indexed, there will be no problem by filtering. Let me point out some of my experiences.
    Think about your users. What are they doing when they use your application. Try to support them in the work they do most often. If they do ask for 10.000 records once a year they are willing to wait, but they don't want to wait, if they do their daily job. That's the reason why we implemented the two layers in the database. The users don't care about the problems we have, so in our second layer we prepare the data to the users daily requirement. Lets say 98% of all statement are for files created within the last two weeks, try to prepare it for the user (use views or trigger, implement a database-job or whatever). 
     
    We made good experiences with the 2 Layer concept. This allows to store the data in a normalized and optimized form and to provide the data for the user in a way he can handle.
    Another experience we made was. Give the user a chance to do things indivudally. Let him store his search-conditions, maybe use user-profiles. One user is only interested in files of his department. Another user wants allways to see the files of the last 2 weeks. Give the users a chance to do it their way.
    You can also implement something like standard-reports (reports of files within the last 2 weeks). This is data you can prepare in the second layer.

    Sometimes our application can help the user to do the job the right way, when we disable the wrong way, and give him a chance to do right. 


    good luck with your project

    hapevau

    ps: please mark as answered if you have no further questions
    • Marked as answer by zapacila89 Monday, June 8, 2009 9:56 PM
    Friday, May 29, 2009 7:44 AM

All replies

  • Hi,

    in our company we have to manage tables with more than 500.000 Rows. For a database like SQLServer or Oracle that is not really a problem. 
    The problem is always the displaying of data and there is no user out there who is able to anlalyze more than 500 Rows. There will be allways a filter set. If not, we only show the first 500 Rows (set a filter on rownum) and implement paging in the application.
    There is only one reason to retrieve all rows and that is the export-functionallity (csv or something like that). That can be done in a background worker process.

    If you store files, I think, you will have metadata describing your record. That are the filters and candidates for indexes in you db.
    If possible I avoid to change the source where I retrieve my data, because I need a new Statement. Keeping all in one table you only have to set your filter.

    We often use 2 or more layers in the database. The one layer is the normalized datalayer and applications do not have direct access to this data.
    The second layer is the applicationlayer where we merge data to avoid not readable foreign-keys and where we store application data like content of dropdown-boxes and so on.
    I recommend this 2 Layer-Architecture. It makes it easier to change structures without impacting the application and it makes it easier to retrieve data.

    By the way. I paninfully learned NOT TO USE SELECT *.

    Hope, that helps a little
    kind regards
    hapevau
    Thursday, May 28, 2009 1:22 PM
  • Thanks. The paging implementation sounds good.

    The current application (i must re write it) has about 240.000 rows on the tables that holds the file records. Actually if i set the row count limit i get the same time when i retrive. Of cource the db design is very poor i has like 18 columns..

    I was afraid to use Filter since applying a filter to 240.000 records takes some time..

    Me http://www.zapacila.com
    Thursday, May 28, 2009 9:02 PM
  • Hello zapacila89,

    if your where-condition is indexed, there will be no problem by filtering. Let me point out some of my experiences.
    Think about your users. What are they doing when they use your application. Try to support them in the work they do most often. If they do ask for 10.000 records once a year they are willing to wait, but they don't want to wait, if they do their daily job. That's the reason why we implemented the two layers in the database. The users don't care about the problems we have, so in our second layer we prepare the data to the users daily requirement. Lets say 98% of all statement are for files created within the last two weeks, try to prepare it for the user (use views or trigger, implement a database-job or whatever). 
     
    We made good experiences with the 2 Layer concept. This allows to store the data in a normalized and optimized form and to provide the data for the user in a way he can handle.
    Another experience we made was. Give the user a chance to do things indivudally. Let him store his search-conditions, maybe use user-profiles. One user is only interested in files of his department. Another user wants allways to see the files of the last 2 weeks. Give the users a chance to do it their way.
    You can also implement something like standard-reports (reports of files within the last 2 weeks). This is data you can prepare in the second layer.

    Sometimes our application can help the user to do the job the right way, when we disable the wrong way, and give him a chance to do right. 


    good luck with your project

    hapevau

    ps: please mark as answered if you have no further questions
    • Marked as answer by zapacila89 Monday, June 8, 2009 9:56 PM
    Friday, May 29, 2009 7:44 AM