none
Improve query performance in Access VBA function - copy and sum RRS feed

  • Question

  • Hello,

    I was searching for some help trying to improve the performance of VBA function in access. I have two functions that are taking 40+ minutes to complete against a table that has 300,000+ records. I'm running on a quad i5 processor laptop.

    I'm not sure if my query can be improved, or if this is expected. 

    My first function loops through the table and for each record, queries a lookup table for some mapping data and updates the first table. I had to reset the dbMaxLocksPerFile to make this work since the table has so many records. The code looks like: (for brevity sake I only show the major logic flow)

    ...
    sql = "SELECT * FROM " & stagingTable
    Set db = CurrentDb
    Set rs = db.OpenRecordset(sql, dbOpenDynaset)
    
    On Error GoTo Handler
    DAO.DBEngine.SetOption dbMaxLocksPerFile, 750000 ' temporary increase lock count for large aetna
    
    Do While Not rs.EOF
      ...
      tsql = "SELECT [Plan Code],[Status] FROM " & mapTable & " WHERE [Plan Code] = " & planCode 
      ...
      Set rs2 = db.OpenRecordset(tsql, dbOpenDynaset)
      If Not rs2.EOF Then
         result = rs2![Status]
      End If
      rs2.Close
      ...
      rs.Edit
      rs![Validated] = result
      rs![error] = message
      rs.Update
            
    ResumeLoop:
      rs.MoveNext
    Loop
    rs.Close
    db.Close
    ...

    This next function also takes 40+ minutes to run. Here, I need to back through the first table and run a sum of sub-totals. I insert these results into a second table. The condensed code looks like:

    ...
    Dim db As DAO.Database
    Dim rs As DAO.Recordset
    Dim sql As String
    sql = "SELECT [Division],[Plan Code],[Status],Sum([Claim]) AS [SumOfClaim] FROM stagingTable " _
    & "WHERE ((([Recorded Date]) Between #" & startDate & "# And #" & endDate & "#)) GROUP BY [Division],[Plan Code],[Status];"
    Set db = CurrentDb
    Set rs = db.OpenRecordset(sql)
    ...
    DoCmd.SetWarnings False
    Do While Not rs.EOF
      ...
      tsql = "INSERT INTO " & outputTable & " (" & outputColumns & ") VALUES (" _
       & Format(startDate, "\#yyyy-mm-dd\#") & ",'" 
       & rs![Division] & "','" _
       & rs![Plan Code] & "','" _
       & rs![Status] & "','" _
       & rs![SumOfClaim] & "');"
       ...
       DoCmd.RunSQL (tsql)
       ...
    ResumeLoop:
       rs.MoveNext
    Loop
    rs.Close
    db.Close
    ...



    Would anyone see anything obvious that could improve the performance of this please? 

    Thank you


    Software Engineer


    • Edited by SolidFish Monday, December 28, 2015 4:20 PM
    Monday, December 28, 2015 4:19 PM

Answers

  • there's no reason to attempt it all in a single query; plus segmenting it into separate queries allows you to check your logic more easily;

    1. do these tables have a field containing a unique, non repeating value i.e. autonumber or key value?

    2. when you write this:

    mainTable.[column2] = mapTable.[mapping]

    do you imply an update? if so which of these tables has the value that is to be updated in the other table?.... Main? or Map?

    3. when you write this:

    WHERE mapTable.[xcode] = mainTable.[xcode]

    AND mapTable.[ycode] = mainTable.[ycode]

    is xcode and ycode value a unique value in the table (only appears once in table) or is it unique relative to column2/mapping? 

    It might help to provide ~3 to 5 representive records of each table - that may be easier than Q&A approach.....

    Tuesday, December 29, 2015 10:11 PM

All replies

  • Definitely - your issue is using loops; you probably come from a programming perspective rather than a database perspective.....and that's not criticism - it just is a different approach.  Database programming never uses loops.  We have query types: Select (a sub set of the overall table), Append (add new records to a table) , Update (changing values of existing records)  ....

    But using the 3 types one should be able to do anything equivalent to loop.....

    Sometimes one does more than one query nested together - and also in some cases one must write some results to a temp table though that is rare (MakeTable query) as an interim step... for instance if a query itself cannot be designed as an updateable type due to use of aggregate sub queries....

    300k record count is a lot for a PC so horsepower is an issue - primarily RAM more than processing - you would be best to have at least 8G... also relatively benign features such as formatting or sorting, when applied to a large count can really slow down the time and so you want to minimize those requirements....but in general most queries on a record size of 300k should just be a couple minutes.

    If you need assist on a query approach - provide a handful of records samples of the starting data and then those same records as the result you seek....and I'm sure it can be set up as queries....


    Monday, December 28, 2015 10:15 PM
  • Your approach is known by the acronym RBAR: Row By Agonizing Row.

    SQL is a set-based language, where commands operate on a set of records at a time, and the database engine code is optimized to deal with these commands so they execute MUCH faster than RBAR.

    In your case the SINGLE SQL statement would be something like:

    insert into outputtable(columnnames)
    select columnnames from stagingtable

    You would execute this sql statement using:
    currentdb.execute sqlstatement, dbFailOnError

    Also note that unless stagingtable is not using a date field for startdate, there is no need for the Format function. That's just a visual nicety for humans. Internally dates are stored in a 8-byte format you don't really need to know, and only when dates are output (e.g. on a form or report), some formatting may be needed.


    -Tom. Microsoft Access MVP


    Tuesday, December 29, 2015 3:06 AM
  • Thank you both for the responses. I've been reading up a bit and you are right, i seem to be taking a programmatic approach on this and I should be looking into making the query capture everything.

    I think I've figured out how to do the second part but I'm still having troubles on the first part. I think I need an UPDATE query but not exactly sure how to write it.

    I have two tables with following fields:

    mainTable: [column1],[column2],[xcode],[ycode]
    mapTable: [xcode],[ycode],[mapping]

    I need to do a mapping from mapTable but it has the following condition

    If mainTable.[column1] = "000" or mainTable.[column1] = "001" Then
      mainTable.[column2] = mapTable.[mapping]
      WHERE mapTable.[xcode] = mainTable.[xcode] 
      AND mapTable.[ycode] = mainTable.[ycode]
    Else
      mainTable.[column2] = mapTable.[mapping]
      WHERE mapTable.[xcode] = mainTable.[xcode] 
    End If

    Is there a way to capture this in a single SQL query?




    • Edited by SolidFish Tuesday, December 29, 2015 8:57 PM
    Tuesday, December 29, 2015 8:04 PM
  • there's no reason to attempt it all in a single query; plus segmenting it into separate queries allows you to check your logic more easily;

    1. do these tables have a field containing a unique, non repeating value i.e. autonumber or key value?

    2. when you write this:

    mainTable.[column2] = mapTable.[mapping]

    do you imply an update? if so which of these tables has the value that is to be updated in the other table?.... Main? or Map?

    3. when you write this:

    WHERE mapTable.[xcode] = mainTable.[xcode]

    AND mapTable.[ycode] = mainTable.[ycode]

    is xcode and ycode value a unique value in the table (only appears once in table) or is it unique relative to column2/mapping? 

    It might help to provide ~3 to 5 representive records of each table - that may be easier than Q&A approach.....

    Tuesday, December 29, 2015 10:11 PM