none
Perform calculation on DataTable or in Excel? RRS feed

  • Question

  • Hello all,

    Not sure whether this is the right forum for this question. Mods will excuse me and redirect, I hope.

    I use VB.Net ('05), an Oracle DB and Excel to implement a time sheet application:
    - Users from the company will enter data relative to their activities during a specified time period.
    - The DB will hold this data for the existing time periods.
    - Excel will display a report on users' activities.

    At the moment, I instantiate a data table (System.Data.DataTable) with the desired data from the DB and send this data to Excel. Once this step is done, I perform calculations on the data from within Excel. Those calculations are trivial but may require some processor time because of the amount of data being dealt with. They also require the creation of two intermediary tables (still in Excel).

    OK, on with the question.

    Would I be better off having those calculations performed before copying the data to Excel?


    Fairly simple question, however, I don't expect the answer to be that simple...

    Since I am deep into this project, I might have missed crucial info in my explanation of the situation. Feel free to ask for precisions.

    Thanks for your time and help,
    Fred
    Thursday, April 3, 2008 3:15 PM

Answers

  • An interesting question..

     

    Some things to consider:

    • Working with Excel from .Net code means using COM.. Always dangerous, Better to keep this to an absolute minimum. Create as few COM objects as possible, track them very closely, and minimize cross-process calls.
    • Excel has a bunch of built-in functionality for manipulating data, generating aggregates, pivoting, etc.. The functionality available in Excel as a data-processing engine might be hard to replicate with your own code in .Net
    • The DataTable itself has very little built-in code for manipulating data.
    • Putting the data processing code in the Excel file means that you can detach it and use it anywhere, completely unconnected from the database and the .Net program. It means that within the portable Excel document, users can have access to the business logic used to create those decision, for modifications, or to re-run against the source data again. They also have access to the original un-processed source data.
    • There is a limit to the amount of data that can be contained in an Excel file, and a definate performance issue when those numbers get high. Processing the data on the Sql Server or in the .Net code might avoid those limitations.
    • Perfomance issues should be determined via testing. Don't assume either one is faster, because it can be deceptive.

    So, I see it as a list of questions:

    1. Would I rather risk the possible loss of stability by doing more work across COM, so that I can do my data processing within Excel?
    2. Does the Excel data processing functionality have everything I need in it to do the kinds of processing that I need to do, now and in the future? Or will I eventually have to write my own code to do specialized data processing logic?
    3. Do I have the time and ability to replicate all of the built-in functionality that I'm already using in Excel?
    4. Will I ever need to encapsulate the data processing logic within a standalone Excel file? Will my users need to get to the un-processed data, within the Excel file, without having access to the database or my program?
    5. In what priority order are the following qualities of my solution; performance, stability, portability, scalability, flexibility?

     

    Also, have you considered processing the data on the Sql Server itself? Consider using stored procedures, views, and possibly Analysis Services to do the heavy data processing work, then just pull a finished product out of the database and put it in the Excel file.... A SQL Server is designed to crunch data. It's faster, more powerful, and more scalable than Excel or anything you'll write in .Net against a DataTable.

     

    Hope that helps,

    Troy

    Friday, April 4, 2008 12:15 AM

All replies

  • "Would I be better off having those calculations performed before copying the data to Excel?"

    Any reason not to try it and see? If the calculations are trivial, implementing them in code shouldn't take too long, and you can run some empirical side-by-side tests. That seems to me to be the simplest and most reliable way to find out which is faster.


    Thursday, April 3, 2008 3:27 PM
  • Hello sushi_cw,

    Thanks for your reply.

    You are absolutely right, I could run a test.
    However, I would like to benefit from other people's experience and avoid pitfalls that I may see after, once I have chosen one of these two paths. Also, there's the obvious reason that I could spend time focusing on other concerns while waiting for experts to enlighten me.

    So I guess my question should be reformulated:
    In your experience, which is the best of the two methods? Not just in terms of performance, but also in terms of implementation, maintenance...

    Have a good day,
    Fred
    Thursday, April 3, 2008 3:38 PM
  •  

    If I understand correctly, Excel will be used for output display to the user?

    Most often, best practice is to separate code from UI. 

    Therefore, perform your calculations within the datatable. 

    Thursday, April 3, 2008 3:47 PM
  • Hello Thomas,

    Thank you for your point. I think this is a very valid one, especially when the user interface relies on an external application.

    As I mentionned earlier, I have not tried yet to perform the calculations within the DataTable. I tend to agree with you, Thomas, that it seems to be best practice.

    Does anyone know of specific pitfalls relative to performing calculations on DTs?

    Fred
    Thursday, April 3, 2008 5:00 PM
  • An interesting question..

     

    Some things to consider:

    • Working with Excel from .Net code means using COM.. Always dangerous, Better to keep this to an absolute minimum. Create as few COM objects as possible, track them very closely, and minimize cross-process calls.
    • Excel has a bunch of built-in functionality for manipulating data, generating aggregates, pivoting, etc.. The functionality available in Excel as a data-processing engine might be hard to replicate with your own code in .Net
    • The DataTable itself has very little built-in code for manipulating data.
    • Putting the data processing code in the Excel file means that you can detach it and use it anywhere, completely unconnected from the database and the .Net program. It means that within the portable Excel document, users can have access to the business logic used to create those decision, for modifications, or to re-run against the source data again. They also have access to the original un-processed source data.
    • There is a limit to the amount of data that can be contained in an Excel file, and a definate performance issue when those numbers get high. Processing the data on the Sql Server or in the .Net code might avoid those limitations.
    • Perfomance issues should be determined via testing. Don't assume either one is faster, because it can be deceptive.

    So, I see it as a list of questions:

    1. Would I rather risk the possible loss of stability by doing more work across COM, so that I can do my data processing within Excel?
    2. Does the Excel data processing functionality have everything I need in it to do the kinds of processing that I need to do, now and in the future? Or will I eventually have to write my own code to do specialized data processing logic?
    3. Do I have the time and ability to replicate all of the built-in functionality that I'm already using in Excel?
    4. Will I ever need to encapsulate the data processing logic within a standalone Excel file? Will my users need to get to the un-processed data, within the Excel file, without having access to the database or my program?
    5. In what priority order are the following qualities of my solution; performance, stability, portability, scalability, flexibility?

     

    Also, have you considered processing the data on the Sql Server itself? Consider using stored procedures, views, and possibly Analysis Services to do the heavy data processing work, then just pull a finished product out of the database and put it in the Excel file.... A SQL Server is designed to crunch data. It's faster, more powerful, and more scalable than Excel or anything you'll write in .Net against a DataTable.

     

    Hope that helps,

    Troy

    Friday, April 4, 2008 12:15 AM
  • Troy, it definitely does help. Thanks a lot for your answer. This is exactly what I was hoping for.

    Fred
    Friday, April 4, 2008 3:18 AM