none
Excel COM Add-in in C# delete columns/rows re-calcuate all cells

    Question

  • Hi:

    I am not sure whether this is the right forum to post my question as my COM Add-in isn’t written in VSTO, instead it uses IDTExtensibility2. Please let me know if you believe it’s not the right place.
     
    We are currently developing an Excel COM object (a dll written in C#). This dll includes a spreadsheet function like "=fun1(cell1, cell2)" exposed from a COM Add-in. The function is used to retrieve data from a database based on contents of cell1 and cell2. The data retrieval could be a long time process so we programmed to fetch data requests and process results asynchronously. We defined a data structure to associate Requests (Results) with the cell addresses such that we can populate the returned values to their associated cell address after receiving Results.

    We are currently experiencing two problems when a user deletes/adds column/rows.

    1. When a user adds/deletes columns/rows, sometimes (not all the time) all cells contain this function get re-calculated thus re-fetch the requests, even though deleted columns/rows are totally unrelated to these cell formulas. I checked the code and made sure we are not setting Application.Volatile for this function.

    2. How can we capture the cell address change after user deletes/adds columns/rows? As I mentioned previously, the Requests/Results are associated with the cell addresses. If user deletes/adds columns/rows, we have to modify cell addresses which are currently in the process of retrieving data or remove the cell addresses completely if they are deleted. Is there an event handler I can use when a deleting/adding columns/rows event happens?

    I have been struggling quite sometime but wasn't able to find a solution for the problems above. Could someone help me?

    Our development env: Visual Studio 2005/Excel 2003.


    Really appreciate it.
    Monday, January 25, 2010 11:20 PM

Answers

  • Hi Charles,

    You are right, this forum is for VSTO solutions. The sticky note "PLEASE READ FIRST - What is VSTO and non-VSTO resources" (http://social.msdn.microsoft.com/Forums/en-US/vsto/thread/063a23a6-1595-4c83-a25f-6c94658c4649) discusses some of the options for non-VSTO forums and other resources.


    Other sites I would personally recommend for Excel and C# help include:

    Stack Overflow, where you can narrow your search for Excel and C# (http://stackoverflow.com/questions/tagged/c%23%20excel) and get outstanding responses, or

    The Extreme VB Talk .NET Automation forum. (http://www.xtremevbtalk.com/forumdisplay.php?f=105) It's a VB Forum, but I answer all Excel questions there, whether in VB.NET, C# or otherwise.

    That said, your question really is an outstanding, advanced Excel question so I feel compelled to reply...

    For issue #1, I think that the overall problem is that you are are caching the ranges that need updating with a string holding the range address of the cells that are to receive the data. This is a mistake. Instead, maintain a reference to the range object itself. This way, if rows are inserted, then the range reference will automatically move with the insertion -- shifting up or down as required.

    I don't know what kind of serialization that might be going on with your procedure however... If you were previously passing a data request along with a string range address to the database, and then the data is returned with the range address and data pulled from the database, then your range address is essentially acting as a unique identifier -- but one that can be rendered obsolete if a row or column is inserted in the mean time.

    Instead, I would create a Dictionary holding your range references keyed by a unique identifier. The unique identifier could be anything: a GUID, or even just an integer counter that starts at 0 and increments by one for each data request. When the data returns, use this unique id to look up the correct range reference from your Dictionary again so that the data returned can be placed in the correct location. By using this approach, the range references will update correctly if rows or columns are inserted or deleted, because you are no longer storing "dead strings" but are instead using a live, range reference to the cell or cells that is to receive the data.

    Issue #2 actually goes away, if you utilize live, range references instead of dead string addresses, as described above. But to answer your question, picking up deleted rows or columns is handled via the Worksheet.Change event (http://msdn.microsoft.com/en-us/library/microsoft.office.tools.excel.worksheet.change(VS.80).aspx). There are some complexities to consider, however, so you might want to have a read of the article "Capture Deleted Rows" (http://www.dailydoseofexcel.com/archives/2006/08/21/capture-deleted-rows/). But, again, you shouldn't need this approach if you reference your ranges directly instead of using string range addresses as your references.

    Hope this helps Charles,
    Mike

    Mike Rosenblum, Excel MVP
    • Proposed as answer by Mike Rosenblum Tuesday, January 26, 2010 10:54 PM
    • Marked as answer by Bessie Zhao Monday, February 01, 2010 3:02 AM
    Tuesday, January 26, 2010 1:41 AM

All replies

  • Hi Charles,

    You are right, this forum is for VSTO solutions. The sticky note "PLEASE READ FIRST - What is VSTO and non-VSTO resources" (http://social.msdn.microsoft.com/Forums/en-US/vsto/thread/063a23a6-1595-4c83-a25f-6c94658c4649) discusses some of the options for non-VSTO forums and other resources.


    Other sites I would personally recommend for Excel and C# help include:

    Stack Overflow, where you can narrow your search for Excel and C# (http://stackoverflow.com/questions/tagged/c%23%20excel) and get outstanding responses, or

    The Extreme VB Talk .NET Automation forum. (http://www.xtremevbtalk.com/forumdisplay.php?f=105) It's a VB Forum, but I answer all Excel questions there, whether in VB.NET, C# or otherwise.

    That said, your question really is an outstanding, advanced Excel question so I feel compelled to reply...

    For issue #1, I think that the overall problem is that you are are caching the ranges that need updating with a string holding the range address of the cells that are to receive the data. This is a mistake. Instead, maintain a reference to the range object itself. This way, if rows are inserted, then the range reference will automatically move with the insertion -- shifting up or down as required.

    I don't know what kind of serialization that might be going on with your procedure however... If you were previously passing a data request along with a string range address to the database, and then the data is returned with the range address and data pulled from the database, then your range address is essentially acting as a unique identifier -- but one that can be rendered obsolete if a row or column is inserted in the mean time.

    Instead, I would create a Dictionary holding your range references keyed by a unique identifier. The unique identifier could be anything: a GUID, or even just an integer counter that starts at 0 and increments by one for each data request. When the data returns, use this unique id to look up the correct range reference from your Dictionary again so that the data returned can be placed in the correct location. By using this approach, the range references will update correctly if rows or columns are inserted or deleted, because you are no longer storing "dead strings" but are instead using a live, range reference to the cell or cells that is to receive the data.

    Issue #2 actually goes away, if you utilize live, range references instead of dead string addresses, as described above. But to answer your question, picking up deleted rows or columns is handled via the Worksheet.Change event (http://msdn.microsoft.com/en-us/library/microsoft.office.tools.excel.worksheet.change(VS.80).aspx). There are some complexities to consider, however, so you might want to have a read of the article "Capture Deleted Rows" (http://www.dailydoseofexcel.com/archives/2006/08/21/capture-deleted-rows/). But, again, you shouldn't need this approach if you reference your ranges directly instead of using string range addresses as your references.

    Hope this helps Charles,
    Mike

    Mike Rosenblum, Excel MVP
    • Proposed as answer by Mike Rosenblum Tuesday, January 26, 2010 10:54 PM
    • Marked as answer by Bessie Zhao Monday, February 01, 2010 3:02 AM
    Tuesday, January 26, 2010 1:41 AM
  • Mike, Reference to the range makes perfect sense as I don't  to marshal the cell address. This is really helpful since I don't need to  manually store/manipulate cell address. Issue #2 should be fixed by this method. Thank you so much.

    The first issue remains unsolved though. Say I have a spreadsheet with a formula in cell B2 = func1(A1, B1). If I open the spreadsheet, everything is OK, the cell isn't get refreshed. But if I delete one row under it, say row 10, which is un-related to this formula, this formula is re-calculated. Is it possible when I open the spreadsheet, something happened in the COM OnConnection event?

    BTW, stackoverflow and xtremevbtalk are my two favorite websites I frequently visit and get a lot help from. Thanks again.
    Tuesday, January 26, 2010 5:31 PM
  • Hi Charles,

    The rules for when Excel might or might not recalculate a given worksheet formula are complicated. The basic rules you know: if the precedent cells referenced by the formula have their values change or are recalculated, then the dependent cells will be recalculated in turn. In addition, volatile worksheet formulas will recalculate if any value change is made anywhere in the in the application.

    However, there are other times when Excel will need to recalculate the worksheet formulas even if we "know" that the result cannot change. Excel does this in order to keep its calculation tables, trees, and chains in an efficient (and correct) state.

    One such cause for the worksheet formulas to recalculate can be if you insert or delete a row on the worksheet. If a non-volatile worksheet formula does not take any arguments (and therefore does not reference any precedents) then inserting or deleting a row on the worksheet will *not* cause the worksheet formula to recalculate. Similarly, if a non-volatile worksheet formula does reference one or more precedent cells, but all such precedent cells reside on the same worksheet as the worksheet formula, then inserting or deleting a row on the worksheet will, again, *not* cause the worksheet formula to recalculate. If, however, a non-volatile worksheet formula references a precedent cell on *another* worksheet, then inserting or deleting a row on either worksheet *will* cause the worksheet formula to recalculate.

    I do not know what routines are operating internally that require that this update occurs. The best source I've found for Excel calculation processes is Recalculation in Microsoft Excel 2002, which gives a very good feel for the operations involved, but does not address this row-insertion issue, specifically. Another good source is Excel's Calculation Process, which I don't think directly addresses this issue either, but it might.

    In short, this is not an aspect that you are going to be able to control. If the user alters the makeup of the worksheet, such as by inserting or deleting a row, the addresses of the cells and the relationships between them will change. Therefore, it is often the case that Excel will need to update (i.e., recalculate) the worksheet formulas to keep it's calculation engine efficient and up to date.

    If you use live Range objects however, instead  of dead string addresses, to reference the Ranges you are interested in, then the insertion or deletion of rows and columns will not affect your procedures. So, although you might have more executions of your user-defined functions than you would like, by using live Range objects, such calculations will not cause your asynchronous procedures to return results to the wrong locations.

    If your data fetches are very slow, such that even returning correct results is painful because the turn-around time is too long, then you might consider caching the database results locally. This way, when the worksheet functions recalculate again with the same exact queries (because a row has been inserted, but the precedent values, and, hence, the data request itself has not changed), then you can quickly fetch the results from the local cache instead of from the database. The catch here, though, is that you would need a mechanism for your database to notify the local cache that the database has been updated with new values so that your code knows that the local cache has become stale (and therefore should not be used on the next data request). But I would avoid this kind of complexity unless the execution speed you are experiencing really is way too slow.

    Hope this helps!
    Mike

    (P.S.: "BTW, stackoverflow and xtremevbtalk are my two favorite websites I frequently visit and get a lot help from. Thanks again." No problem! And these are my favorite sites too, along with this VSTO forum. :))



    Mike Rosenblum, Excel MVP
    Tuesday, January 26, 2010 10:46 PM
  • Thanks Mike. We did implement cache locally.

    For issue 1, I tried making the function non-volatile explicitly but still the function gets re-calculated after opening the spreadsheet then deleting rows below the cell containing the function.

    I will read the documents you recommended to get a better understanding on how the calculation process is done in Excel - I am pretty new to the Excel plug-in so it's good to learn anyway. Your help is very much appreciated.
    Tuesday, January 26, 2010 11:31 PM
  • > "Thanks Mike. We did implement cache locally."

    Ok, gotcha.


    > "For issue 1, I tried making the function non-volatile explicitly but still the function gets re-calculated after opening the spreadsheet then deleting rows below the cell containing the function."

    Well, as I explained, this is by design. You will not be able to avoid this behavior, not if the worksheet formulas that call your user defined function reference precedent cells that reside on another worksheet.


    > "I will read the documents you recommended to get a better understanding on how the calculation process is done in Excel - I am pretty new to the Excel plug-in so it's good to learn anyway. Your help is very much appreciated."

    They won't solve your current issue, I'm afraid... but they make for interesting reading, which is why I mentioned them.


    Good luck!
    Mike

    Mike Rosenblum, Excel MVP
    Tuesday, January 26, 2010 11:38 PM
  • Hi Mike thanks for your help now I got almost everything done - stuck on other projects recently and now switched back to this one.

    I got another problem though, while I processing and populating data returned from Server into Excel sometimes the cells are not updated if I switch tabs (worksheets) during the process. Do you happen to know what is the reason?

    Basically what I did is:

    1. re-organize cells into unions sheet by sheet then call Union.Dirty() and Union.Calculate().
    2. in exception handler if receiving a 0x800AC472 error, I put these cells into a queue and process them at a later time. I do this is because I understand Excel is a single-thread application thus could be busy when I try to invoke the main thread (containing Excel main) and populate result data.

    Really appreciate your help on the Excel project.
    Tuesday, March 09, 2010 6:36 PM