Monday, February 27, 2012 5:54 AM
My application needs to do various things. It creates/opens DBF files as needed and before moving forward to next step, it closes the DBF files no longer needed. Amongst the DBFs to be used there are three BIG size DBFs - two are nearly 500-600 MB each and one is nearly 1.0-1.3 GB. All three have supporting CDX files also.
First a newly created file needs to append data from a CSV file and gets to around 500-600 MB and is the main file to be processed; Then it is closed.
Then another newly created file needs to append data from a CSV file and gets to around 1.0-1.3 GB > this file is processed and closed.
Then third newly created file needs to append data from a CSV file and gets to around 500-600 MB and it is also closed after processing.
The application is generally taking 80 minutes to complete its job. In order to optimize for execution time, I coded messages for times taken for various steps of the application. And found that just closing the 1.0-1.3 GB DBF file was taking upto 15-16 minutes. For further confirming it I checked the Modified_time in Windows_Explorer for the files and found that the DBF file time was 15 minutes later than CDX file. So, I presume that closing the DBF file is taking 15-16 minutes.
Another curious thing was observed; though the file needed to be processed sequentially after indexing (Filter could not be used) - not all processing/calculation needed to be done on all the records. So, I further modified the code to do the calculation on appropriate records only and to skip other records one by one. During one such processing the DBF file closed on the same time as the CDX file and nearly 20 minutes was saved from processing time. Though due to some requirements the code needed to be further modified and some more records in the table became eligible for calculation of things and next run did not save similar time. It seemed to suggest that doing lots of calculations on lots of records in a table having size of nearly 1 GB is causing the table to take time in closing.
The code for closing the table is:
SET RELATION TO
SET ORDER TO
I will look forward to your guidance about how to save this time in closing the table
- Edited by S Bans Monday, February 27, 2012 5:59 AM
Monday, February 27, 2012 6:56 AMModerator
Hi S Bans
you should measure similar operations on the SQL Server database of the same size and post the results...
Problems you are describing are caused by caching most probably. VFP keeps the data in memory and saves them to the disk when the memory occupied by cache is needed for another file or when you close the cached file. You may observe this behavior on exclusively open files. And to write nearly 1 GB of data takes some time...
Windows file buffers can affect this process as well.
If you will open your files in shared mode then their closing will be very fast because each change is written to the disk immediately BUT the file processing will take about 4 times more time due to the shared file access speed...
So if you want to reduce the delay in your large files processing then the only option seems to be the faster hard drive...
Monday, February 27, 2012 7:11 AM
Hello Pavel Celba
Thanks for your guidance. Incidentally, the DBF table that is being closed is not needed anymore; and it is to be erased immediately after closing. It there some way to not write this file on the disk while closing it and save time?
It seems like I should experiment with "ZAP" before "USE"
SET RELATION TO
SET ORDER TO
- Edited by S Bans Monday, February 27, 2012 7:22 AM
Monday, February 27, 2012 7:25 AM
Pavel already told about caching, to connect that to the USE for closing the DBF: It's normal that this triggers flushing of buffers for this dbf, both the foxpro buffers you control via cursorsetprop("Buffering",..) or via form property buffermode and also the caching of files foxpro does on top of the OS.
So, yes, USE doesn't only close a file handle, before that it writes back changes.
Overall I also did and do a lot of data crunching in the range of several GB on a daily basis, currently ~8 GB within 2-3 hours, that of course does not only depend on amount of data, but also on the operations done.
What helps reduce the time is to divide data into smaller chunks, and then FLUSH in alias FORCE and SYS(1104) (purge cached memory) in between processing these chunks.
- Marked As Answer by Mark Liu-lxfModerator Thursday, March 08, 2012 9:13 AM
Monday, February 27, 2012 7:26 AM
If you just need data temporarily and don't want to write it to disc, then use a cursor.