locked
Very High Time taken for SQL query on large DBF files using jdbc:odbc:Driver={Microsoft Visual FoxPro Driver} RRS feed

  • Question


  • Hi All,

    I am using a java program to access DBF files using Visual Fox Pro ODBC Driver.

    The database connection has been established using the following code:

    Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");

    databaseConnection = DriverManager.getConnection("jdbcSurprisedbcBig Smileriver={Microsoft Visual FoxPro Driver}; SourceType=DBF; SourceDB=" + databasePath + "; Exclusive=No");

    The SQL query I made on one of the tables (CLINICAL.DBF) was:

     s.executeQuery("SELECT entity_id, readcode, master_id FROM CLINICAL WHERE master_id = '" + entityID + "'"+" AND auditflag = '1' AND eventdate = '"+ currentDate +"'");

    The table contains about 30,000 records.

    The time taken by the query to return results is nearly 8 seconds.

    When I try the same query for larger DBF files , it takes 52 seconds for a query over a large DBF file.

    I may be missing out something obvious here.

    A SQL query clearly cannot take so much time? Is there anything I am missing out?

    There are CDX files and FPT files associated with the DBF files. But  can I mke use of them in any way?

    Any help will be greatly appreciated.

    Thanks in advance,

    Sunny







    Wednesday, August 6, 2008 4:09 PM

Answers

  • My guess is that the application was originally written before FoxPro started asking tables to be marked with a codepage, and thus they never got marked.

    I don't think you'll be able to optimize without having the right codepage.

    The easiest way to mark the table is to open it in VFP. A dialog appears and you can mark it. As an alternative, VFP comes with a utility program (CPZero.PRG) for setting codepages.

    However, neither of those is going to help you, since you don't have a copy of VFP. I'm sure you can do this with a hex editor, but it would require being really careful. Obviously, make a back-up first.

    Maybe someone else here has a better idea or can give you specific steps.

    Tamar
    Friday, August 15, 2008 7:31 PM
    Answerer
  • Hi Tamar,

    I have been running a few experiments.
    I found a software to code mark the tables through google search.
    I also found a way of getting the current active code used by windows by using >chcp on a command prompt. You can set the active code page by specifying chcp "code".

    I ran the same query 100 times in a loop (similar to the way you ran). My experiments gave me the following results:
    Experiment1:
    The active code page of system was 437.
    Unmarked tables Time taken:  36 seconds.
    Experiment 2:
    Active code page of system was set to 1252.
    Tables were code marked with 1252: Time taken: 36 seconds
    Experiment 3:
    Active code page of system was set to 437.
    Tables were code marked with 437: Time taken: 40 seconds
    Experiment 4:
    Active code page of system was set to 850.
    Tables were code marked with 1252: Time taken: 40 seconds

    When I ran the query individually (without a loop), the first iteration took 8 seconds, the subsequent iteration 5 seconds, the next few iterations 2 seconds, and the many iterations that followed took only around 624 milliseconds. It is the first iteration which always takes maximum time.

    Could you please confirm the following for me:

    You said in your earlier mail that "I dropped the query into a loop and ran it 100 times (sending the query results to a cursor and closing all tables each time through). Total time was less than 35 seconds."

    Was this before you marked the table? Could you please tell me exactly the time taken before and after marking table?

    If it took 35 seconds for you after marking the table, then it is roughly taking the same time for me too. So, is there a way to make this response faster?

    Thanks,
    Sunny
    Thursday, September 4, 2008 11:08 AM

All replies

  • You say you have CDX files. Are they structural CDXs (that is, does the CDX have the same name as the table)?

    Do you have tags for the appropriate fields? In this case, you'd want tags for Master_ID, auditflag and EventDate.

    Do you have access to a copy of VFP? If so, what happens when you run the same query from the Command Window?

    Tamar
    Wednesday, August 6, 2008 9:36 PM
    Answerer
  • Dear Tamar,

    Thank you so much for your reply.

    Yes, I do have a CDX file with the same name.

    And they do contain those tags: Master_ID, auditflag and EventDate..

    Unfortunately, I do not have VFP and so cannot run it from the command prompt.

    Could you suggest anything else I could do? Why do you think SQL queries are taking so long?

    Please help.

    Thanks,

    Sunny


    Wednesday, August 6, 2008 10:26 PM
  • It's really hard to say without being able to test in VFP. Doing that would make it clear whether the problem is with the query and data or the JDBC connection.

    How big is the data, if you zip it up?

    Tamar
    Thursday, August 7, 2008 8:49 PM
    Answerer
  • Hi Tamar,

    Thank you very much for your reply.

    I am facing the problem for similar queries on other tables too. The larger the table, the more time it takes for the results.

    The tables take about 6MB, 12 MB etc. when zipped.

    Could I please send you one table (in dbf and cdx formats) in a zipped format to your email?

    Could you then run a sample query similar to mine and let me know how much time it takes?

    The time taken varies in each iteration. It  sometimes takes only 0.5 seconds. But the same query sometimes takes 7 seconds.

    I would be grateful if could give me your email and I can send it.

    My email is sajit@calicojack.co.uk. Could you send me your email there and I can send you a sample file?

    Thanks,
    Sunny
    Friday, August 8, 2008 12:58 PM
  • I don't usually agree to take this stuff offline, but in this case, I think the test needs to be done. Watch for a message from me.

    I probably won't be able to try this until late in the weekend or Monday, though.

    Tamar
    Friday, August 8, 2008 8:27 PM
    Answerer
  • It's taken a while for Sunny to get me the files. (Too large for standard email.) But today I was able to get them and test.

    Initially, I left the table without a code page and that version was slow. Once I marked the table with code page 1252, the result was instant.

    I dropped the query into a loop and ran it 100 times (sending the query results to a cursor and closing all tables each time through). Total time was less than 35 seconds.

    So, I think the first thing you need to do is mark your tables with the right codepage. That's whichever one you're operating in. This makes sense to me because VFP 9 cares whether the current code page and table's code page match. It can't optimize queries where they don't.

    I seem to recall that you don't have a copy of VFP--that's why I'm doing this testing. So, you're going to have to find a way to mark the tables without VFP. If it helps, the code page is stored in (zero-based) byte 29 of the table header.

    Tamar

    Thursday, August 14, 2008 9:19 PM
    Answerer
  • Hi Tamar,

    Many Thanks for testing the files.

    So, I understand that the query takes high time to return results if the tables are used without a code page.

    However, I am not clear about where and how one can mark the code page for a table. I would have thought the VFP database system I am using would already contain a code page. I am have not designed or created this database system. It ia an already existing VFP database system that is being used commercially by various organisations.

    I am basically writng another java application/program that queries the database and processes the results of the query.
    This is done using the JDBCSurpriseDBC driver.

    So, my questions are:

    1. Do I need to modify the table (.DBF) file and mark it with a code page?

    2. Do I need to modify the .CDX file?

    3. Or is there a way I can modify the query appropriately (specify code page parameters)?

    Could I optimise the query without modifying the existing tables?

    Thanks again,

    Sunny
    Friday, August 15, 2008 3:39 PM
  • My guess is that the application was originally written before FoxPro started asking tables to be marked with a codepage, and thus they never got marked.

    I don't think you'll be able to optimize without having the right codepage.

    The easiest way to mark the table is to open it in VFP. A dialog appears and you can mark it. As an alternative, VFP comes with a utility program (CPZero.PRG) for setting codepages.

    However, neither of those is going to help you, since you don't have a copy of VFP. I'm sure you can do this with a hex editor, but it would require being really careful. Obviously, make a back-up first.

    Maybe someone else here has a better idea or can give you specific steps.

    Tamar
    Friday, August 15, 2008 7:31 PM
    Answerer
  • Hi Tamar,

    I have been running a few experiments.
    I found a software to code mark the tables through google search.
    I also found a way of getting the current active code used by windows by using >chcp on a command prompt. You can set the active code page by specifying chcp "code".

    I ran the same query 100 times in a loop (similar to the way you ran). My experiments gave me the following results:
    Experiment1:
    The active code page of system was 437.
    Unmarked tables Time taken:  36 seconds.
    Experiment 2:
    Active code page of system was set to 1252.
    Tables were code marked with 1252: Time taken: 36 seconds
    Experiment 3:
    Active code page of system was set to 437.
    Tables were code marked with 437: Time taken: 40 seconds
    Experiment 4:
    Active code page of system was set to 850.
    Tables were code marked with 1252: Time taken: 40 seconds

    When I ran the query individually (without a loop), the first iteration took 8 seconds, the subsequent iteration 5 seconds, the next few iterations 2 seconds, and the many iterations that followed took only around 624 milliseconds. It is the first iteration which always takes maximum time.

    Could you please confirm the following for me:

    You said in your earlier mail that "I dropped the query into a loop and ran it 100 times (sending the query results to a cursor and closing all tables each time through). Total time was less than 35 seconds."

    Was this before you marked the table? Could you please tell me exactly the time taken before and after marking table?

    If it took 35 seconds for you after marking the table, then it is roughly taking the same time for me too. So, is there a way to make this response faster?

    Thanks,
    Sunny
    Thursday, September 4, 2008 11:08 AM