none
Open XML SDK v2 Handling Multiple Threads RRS feed

  • Question

  • Hi

    I have read a post or two that suggests the open XML sdk is not thread safe but I haven't found any official MS document that states this in fact to the contrary most online docs demonstrate how to use the SDK for serverside creation of office documents. I'm prototyping a windows service that reads MSMQ and dispatches a thread to export some data to excel using open xml sdk v2.  The thread is created by calling BeginInvoke on a delegate and ultimately calls this code:

    using

    (SpreadsheetDocument workBook = SpreadsheetDocument.Open(exportFileInfo.FullName, true))

    {

     

    WorkbookHelper workBookHelper = new WorkbookHelper(workBook.WorkbookPart, tableName);

    exportRowCount =

    this.DataBind(workBookHelper, datasource);

    workBook.WorkbookPart.Workbook.Save();

    workBook.Close();

     

     

    return exportRowCount;

    }

    If I allow 2 or more threads to run concurrently:

    • The code never finishes when I try to export 10000 rows. (I suspect I'm getting deadlock but not sure how to verify this). 
    • It does work if I try to export 5000 rows or less.

    If I comment out the calls to save() and close() the code always finishes no matter how many threads I allow to run.  If I allow just one thread to run the exports always works.

    Is there a threading problem in the SDK related to saving and closing workbooks and if so is there a plan to fix it soon?

    Other than synchronising the entire block of code above is there a way I can I fix the problem myself?

    Regards

    

    

    

     

    Tuesday, April 12, 2011 3:26 PM

Answers

  • Hi

    Thanks for responding.  I have gone through an MSDN support ticket on this issue and can provide an update.

    Despite what you might read in the official document the open xml SDK version 2 is only thread safe if your spreadsheet don’t contain any parts that exceed 10MB.  I have read something similar to this on another post but dismissed it because my work books were less that 10MB in size.  The important nugget I missed was that this limit applies to a part’s size when unzipped.

    For background:  The issue is not inside the SDK itself and originates from the SDK’s implementation which relies on the class System.IO.Packaging.Package.  Internally this class uses a memory stream for parts up to 10MB which is thread safe, for bigger parts it switches to IsolatedStorage which is not threadsafe.... Oops.

    Full details are here:  http://support.microsoft.com/kb/951731

    Naturally I have asked if there is a plan to fix this and when I can expect to get it.  No answer yet..

    Here is the confirmation email from my MSDN ticket:

    Hello Phil,

    Thanks  for your time to discuss the issue today.

    I am providing you a copy of our scope agreement for your issue.

    Issue Definition: You have a windows service which monitors a MSMQ queue. Once a request arrives another component of the application creates the Excel document package document by pulling data from the database.
    When a single thread is spawned the application woks fine but when more than one thread are spawned ..the application hangs at the workbook.close call

    Scope Agreement: We will help you find why the open xml application hangs at the workbook.close call and try to look for a workaround for it, if possible.

    Troubleshooting/Progress-

    -          We had a live meeting call

    -          You demonstrated the issue to me

    -          We debugged and saw when you just spawned a single thread, the application works fine

    -          But when more than one thread is spawned it hangs on workbook.close

    -          I looked up and found an issue reported around Open xml packaging multithreaded application when the xml part exceeded 10MB

    -          We checked your generated files and they were around 3.5 mb

    -          But when we decompressed them the sheet size was about 13 MB which met the criteria for the above issue

    -          We also discussed that the reason for the issue is that till the size if 10mb the parts are handled in-memory (which is thread safe) but once the size exceeds it is handled as physical file(which is thread unsafe) and is known to have issues.

    So the following option are suggested for your case -

    1) Limit the size of an xml part to be less than 10MB

    2) You can have the rest of the operations as multithreaded but allow only a single thread to access the operation of the packaging code/api

    However, I will check on whether this is bug or expected limitation and also see if there is any workaround available.

    If you have any questions or concerns, please don't hesitate to contact me.

    • Proposed as answer by cjatmsModerator Monday, May 9, 2011 2:40 PM
    • Marked as answer by Bruce Song Tuesday, May 10, 2011 11:27 AM
    Wednesday, April 27, 2011 10:17 AM

All replies

  • Hi,

    As far as I know, there exist the Official document on MSDN library which says the Open XML SDK supports muti-thread. You can check this article:

    http://msdn.microsoft.com/en-us/library/dd440953(office.12).aspx

    Take a look at Why Use the Open XML Format SDK:

    The major advantage to using the Open XML Format SDK is that it is fully supported on the server, unlike automating Microsoft Office applications. That means you can create managed code solutions that are scalable and stable on the server. Imagine being able to write multi-threaded solutions that build on top of the SDK.

     As for the problem, I think there should be something wrong with your code. I think you need to set more time for the method to run. I have following suggestions for you to take:

    1. Try to let the main thread sleep for a while

    2. Try to use Invoke method

    Hope this can help you.

    Regards,


    Be happy.
    Wednesday, April 13, 2011 5:25 AM
  • Hi and thanks for taking the time to respond.  I'm not really sure why making the main thread sleep will help.  I'm trying to export as many spreadsheets as possible concurrently.  Also this is running as a windows service and so there is no GUI.  Think that rules out Invoke.

    I have done some more testing and now I am convinced there is a problem with the SDK.  Just to be clear, I'm trying to export concurrently but each thread is working on its own spreadsheet ie a differrent physical file on the disk.  Specifically, two threads never work on the same workbook.  I changed the code slightly to this:

    public UInt32 Export1(String tableName, IEnumerable<T> datasource, System.IO.FileInfo exportFileInfo)
    
    {
    
     UInt32 exportRowCount = 0;
    
     try 
    
     {
    
      SpreadsheetDocument workBook = SpreadsheetDocument.Open(exportFileInfo.FullName, true)
    
      WorkbookHelper workBookHelper = new WorkbookHelper(workBook.WorkbookPart, tableName);
    
    
    
      exportRowCount = this.DataBind(workBookHelper, datasource);
    
      //workBook.WorkbookPart.Workbook.Save();
    
      workBook.Close();
    
      // Break point here is never reached 
    
      return exportRowCount;
    
     }
    
     catch(Exception e)
    
     {
    
      // Break point here is never reached
    
      String s = e.Message;
    
     }
    
     return 0;
    
    }
    
    

    If I run one thread at a time this always works.  I even managed to create a workbook with 600,000 rows across 4 worksheets.  ie my code is working;

    A 10,000 line export takes 10 seconds.  If I start one then count to 5 then start another both work.  If I start two at the same time the workbook.Close() method call never returns nor does it throw an exception.  When it hits this call the CPUs go to 0% and everything stops.  If I run another single export after this it works fine.

    I'm running .net 3.5 SP1 and I've got the same behaviour on W2003 and Vista Enterprise.

    Has anyone out there managed to create spreadsheets simultaneously on separate threads?  Any other info also greatly appreciated.

    Regards

     


    Thursday, April 14, 2011 6:24 AM
  • Hello Phil,

    There is no reported bug for the issue you describe, whether in the  Open XML SDK or in the components e.g. System.IO.Packaging (suspect in terms of locking) or race conditions in the type of multi-threading you are using.

    This Forum has many browsers, but none so far report experience similar to yours.

    If you could build a repro including data sources for use in populating the XML someone could try reproducing the issue and then provide a workaround.

    Alternatively,  your question falls into the paid support category which requires a more in-depth level of support.  In that case the Microsoft engineer will need a repro before he/she can see the problem. Please visit the below link to see the various paid support options that are available to better meet your needs.

    http://support.microsoft.com/default.aspx?id=fh;en-us;offerprophone

    Regards,

    Chris Jensen
    Senior Technical Support Lead

    Tuesday, April 19, 2011 7:44 PM
    Moderator
  • Hi

    Thanks for responding.  I have gone through an MSDN support ticket on this issue and can provide an update.

    Despite what you might read in the official document the open xml SDK version 2 is only thread safe if your spreadsheet don’t contain any parts that exceed 10MB.  I have read something similar to this on another post but dismissed it because my work books were less that 10MB in size.  The important nugget I missed was that this limit applies to a part’s size when unzipped.

    For background:  The issue is not inside the SDK itself and originates from the SDK’s implementation which relies on the class System.IO.Packaging.Package.  Internally this class uses a memory stream for parts up to 10MB which is thread safe, for bigger parts it switches to IsolatedStorage which is not threadsafe.... Oops.

    Full details are here:  http://support.microsoft.com/kb/951731

    Naturally I have asked if there is a plan to fix this and when I can expect to get it.  No answer yet..

    Here is the confirmation email from my MSDN ticket:

    Hello Phil,

    Thanks  for your time to discuss the issue today.

    I am providing you a copy of our scope agreement for your issue.

    Issue Definition: You have a windows service which monitors a MSMQ queue. Once a request arrives another component of the application creates the Excel document package document by pulling data from the database.
    When a single thread is spawned the application woks fine but when more than one thread are spawned ..the application hangs at the workbook.close call

    Scope Agreement: We will help you find why the open xml application hangs at the workbook.close call and try to look for a workaround for it, if possible.

    Troubleshooting/Progress-

    -          We had a live meeting call

    -          You demonstrated the issue to me

    -          We debugged and saw when you just spawned a single thread, the application works fine

    -          But when more than one thread is spawned it hangs on workbook.close

    -          I looked up and found an issue reported around Open xml packaging multithreaded application when the xml part exceeded 10MB

    -          We checked your generated files and they were around 3.5 mb

    -          But when we decompressed them the sheet size was about 13 MB which met the criteria for the above issue

    -          We also discussed that the reason for the issue is that till the size if 10mb the parts are handled in-memory (which is thread safe) but once the size exceeds it is handled as physical file(which is thread unsafe) and is known to have issues.

    So the following option are suggested for your case -

    1) Limit the size of an xml part to be less than 10MB

    2) You can have the rest of the operations as multithreaded but allow only a single thread to access the operation of the packaging code/api

    However, I will check on whether this is bug or expected limitation and also see if there is any workaround available.

    If you have any questions or concerns, please don't hesitate to contact me.

    • Proposed as answer by cjatmsModerator Monday, May 9, 2011 2:40 PM
    • Marked as answer by Bruce Song Tuesday, May 10, 2011 11:27 AM
    Wednesday, April 27, 2011 10:17 AM
  • Hello Phil,

    Thanks for sharing what you learned about Thread-Safe attributes of the SDK, uncompressed part size and multithreading with the SDK. This will be seen by other users of the forum and so will enable them to move on with less delay.

    Regards,
    Chris Jensen
    Senior Technical Support Lead

    Wednesday, April 27, 2011 6:47 PM
    Moderator
  • I am having the exact same issue but while exporting large Word documents.  I know this post is somewhat old but has a solution or workaround been found other than the suggested options above?
    Monday, January 7, 2013 7:06 PM
  • Guys, there is a nice post

    http://blogs.msdn.com/b/brian_jones/archive/2010/06/22/writing-large-excel-files-with-the-open-xml-sdk.aspx

    which explains how to write large excel files using SAX.

    We have a big BI application which generated excel files from interactive OLAP reports, and it suffered from these problems when we had DOM. There were 3 problems:

    - parallel generation was impossible:  we got this crash with isolated storage

    - it consumed too much memory

    - it was slow.

    When we moved to SAX, we've solved problems with memory and with speed - everything became much faster.

    The only thing which we don't know is next:   will the same problem appear with multiple threads if we use SAX (but not DOM)?

    Is there anyone here who moved to SAX and faced with same problem with isolatedstorage?

    <style type="text/css">img.imageResizerActiveClass{cursor:nw-resize !important;outline:1px dashed black !important;} img.imageResizerChangedClass{z-index:300 !important;max-width:none !important;max-height:none !important;} img.imageResizerBoxClass{margin:auto; z-index:99999 !important; ; top:0; left:0; right:0; bottom:0; border:1px solid white; outline:1px solid black;} </style>
    • Edited by Ihor Bobak Thursday, August 22, 2013 10:39 AM
    Thursday, August 22, 2013 10:38 AM
  • Hi,

    I am also facing thread deadlock issue if i try to create large Word Documents on multiple threads. Is there any latest update on this issue other than putting a big lock around the document generation code.

    Thanks,

    Mantu


    VS 2005 Developer

    Monday, September 29, 2014 9:34 AM
  • We are using SAX and getting the same problem when have multiple threads reading an excel file. 
    Friday, August 14, 2015 6:13 AM
  • We have the same problem. There is a fix for this (KB2996568) but it applies only to .NET 4.5. Unfortunately for reasons unrelated to this we  need to stay with 3.5 at this time. I tried many different solutions, ultimately setting on a SAX solution for speed and memory management, but the issue persists if multiple threards are used. 

    MS, any updates?

    Wednesday, October 14, 2015 1:01 PM