none
Worksheet causes OutOfMemory exeption RRS feed

  • Question

  • I have to edit and save after it big xlsx document which has 10 sheets with more then 1000 cells in each.
    I am trying to save it using next code (even without any changes):

                For Each WorksheetPart In WorksheetParts
                    Try
                        If WorksheetPart.Worksheet Is Nothing Then Continue For
                        ' Save the changes
                        WorksheetPart.Worksheet.Save()
                    Catch
                    Finally
                        WorksheetPart = Nothing
                        GC.Collect()
                    End Try
                Next

    each call of WorksheetPart.Worksheet  costs at least 20 mb whereas size of original document is just 10 mb. memory is not released and GC. Collect does not help. By th end my CPU usage is about 1GB.
    Does anyone know some way to avoid such a problem?
    Friday, August 7, 2009 1:33 PM

All replies

  • Hi,
    The reason for memory ran-out is the DOM tree in the memory is too big. A workaround you may have a try: use Open XML reader to load the specific elememt you want to manipulate in stream, after that use Open XML writer to write back your changes.
    Besides, I'm curious what's the content in your file? what are you going to do with the files? Thanks!

    Shuangshuang (MS)
    Monday, August 10, 2009 7:41 AM
  • I am trying to force recalculating all formulas in the document. Therefore from each worksheet I get cells with formulas, set for them Cell.CellFormula.CalculateCell.Value = True and save worksheet. This was just a test document with a lot of cells in different worksheets.
    Tuesday, August 11, 2009 6:34 AM
  • Hello akstanya

    Have you checked this FAQ?

    -----
    How to automatically recalculate the formula value for cells in Spreadsheet?

    A cell with a formula will store the cached value in the markup and show the cached value as the result. If the cell value referenced by the formula is changed programmatically, the cell with the formula has no idea of that change. The cached value of this formula is then obsolete, and should be updated. The automatic recalculation requires runtime layout functionality, which is not supported in the SDK. The workaround for this situation is to: delete the cached value of the formula cell, and then invoke Excel to recalculate the value for you at backend. The Excel service could help here
    -----

    I tried to remove the relevant element <v/>, and saw Excel re-compute automatically when it is reopened. Worked fine. That is, you can do that programatically, using XmlDocument or something.

    Tuesday, August 11, 2009 8:38 AM
  • Thanks for your reply. For small documents CalculateCell.Value = True also works fine.  My problem is that I can't get the collection of the cells - OutOfMemory exeption is thrown when I am trying to get Worksheet. It is not really a problem of recalculation - what to do if I need to change value of the cell in this document? The document is not too big and it can be successfully opened by exel. The code which causes this exeption is just a simple example to show where is the problem. The idea of working without Worksheets is not appropriate.
    Tuesday, August 11, 2009 12:21 PM
  • I think I am not properly understood. Beauty of open-xml is the use of XML instead of old VBA. Still the SDK implements some VBA like ways such as Worksheet or Cellvalue. Case by case, you can choose any way, XML or VBA like ways. For some bulk update-remove-insert works I may choose XML. I tried to edit more than 100,000 cell values with XML. It does within 2 seconds without noticeable memory change. With XSLT or LINQ, it should be even more efficent.

     

                    using (SpreadsheetDocument spdoc = SpreadsheetDocument.Open("c:/scraps.xlsx", true))
                    {
                        WorkbookPart wbp = spdoc.WorkbookPart;
                        Console.WriteLine(DateTime.Now.ToString());

                        foreach (WorksheetPart wsp in wbp.WorksheetParts)
                        {
                            XmlDocument doc = new XmlDocument();
                            const string namespace1 =
                                "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
                            XmlNamespaceManager nsm = new XmlNamespaceManager(doc.NameTable);
                            nsm.AddNamespace("m", namespace1);

                            doc.Load(
                                wsp.GetStream(FileMode.Open, FileAccess.ReadWrite)
                                );
                            XmlNodeList lst = doc.SelectNodes("descendant::m:c/m:v", nsm);

                            foreach (XmlNode v in lst)
                            {
                                v.InnerText = "1";
                            }

                            Console.WriteLine(lst.Count.ToString());
                        }

                        Console.WriteLine(DateTime.Now.ToString());
                    }

    Wednesday, August 12, 2009 4:01 AM
  • Because refreshness of formula has been my own issue either, I tested one more code. Redundant or not.

    1. make an Excel file, with data, a1: =sum(b1:c1), b1: 1, c1: 2
    2. run following code. open the file as read-write mode, edit b2 cell to 9, and remove a1 cell v node.
    3. re-open the file manually and see the a1 cell is changed into 11.

                    using (SpreadsheetDocument spdoc = SpreadsheetDocument.Open("c:/x.xlsx", true))
                    {
                        WorkbookPart wbp = spdoc.WorkbookPart;
                        WorksheetPart wsp = (WorksheetPart)wbp.GetPartById("rId1");

                        XmlDocument doc = new XmlDocument();
                        const string namespace1 =
                            "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
                        XmlNamespaceManager nsm = new XmlNamespaceManager(doc.NameTable);
                        nsm.AddNamespace("m", namespace1);

                        doc.Load(wsp.GetStream());
                        XmlNode node = doc.SelectSingleNode("/descendant::m:row/m:c/m:v", nsm);
                        XmlNode node2 = doc.SelectSingleNode("/descendant::m:row/m:c[2]/m:v", nsm);
                        node2.InnerText = "9";
                        node.ParentNode.RemoveChild(node);

                        using (Stream outs =
                            wsp.GetStream(FileMode.Open, FileAccess.ReadWrite)
                            )
                        {
                            doc.Save(outs);
                        }

                        Console.WriteLine(doc.OuterXml);
                    }

    Wednesday, August 12, 2009 6:43 AM
  • I does not work :(. It takes nearly 40 sec only to perform doc.Load( wsp.GetStream(FileMode.Open, FileAccess.ReadWrite)); and after it memory usage is about 1Gb. Worksheet has 143746 rows connected to external data.
    Wednesday, August 12, 2009 9:08 AM
  • Don't give up anyway. ^_^ Actually xmldocument is the worst object for such a huge documents. Try xmlReader, xelement, or openXmlReader instead. Let me know what happens with this code.

                    using (SpreadsheetDocument spdoc = SpreadsheetDocument.Open("c:/x.xlsx", true))
                    {
                        WorkbookPart wbp = spdoc.WorkbookPart;

                        foreach (WorksheetPart spart in wbp.WorksheetParts)
                        {
                            using (XmlReader xrd = XmlReader.Create(spart.GetStream()))
                            {
                                //XElement ele = XElement.Load(xrd);
                            }
                        }

    }

    Just with xmlreader, xelement better, you can do a couple of things using XSLT or LINQ. Try even the xelement, commented part, if xmlReader loads. I attach usage of Xelement below. If you can't use even Xelement, the last option can be xmlreaer/openxmlReader + XSLT, of which you may have no knowledge yet.

                    using (SpreadsheetDocument spdoc = SpreadsheetDocument.Open("c:/x.xlsx", true))
                    {
            const string namespace1 =
                "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
                        WorkbookPart wbp = spdoc.WorkbookPart;
                        WorksheetPart wsp = (WorksheetPart)wbp.GetPartById("rId1");

                        XmlReader xrd = XmlReader.Create(wsp.GetStream());
                        XElement ele = XElement.Load(xrd);
                        xrd.Close();

                        XmlNamespaceManager mgr = new XmlNamespaceManager(xrd.NameTable);
                        mgr.AddNamespace("m", namespace1);

                        XElement ele1 = ele.XPathSelectElement("/descendant::m:row/m:c[1]/m:v", mgr);
                        XElement ele2 = ele.XPathSelectElement("/descendant::m:row/m:c[2]/m:v", mgr);
                        ele1.Remove();
                        ele2.Value = "100";

                        XmlWriterSettings sett = new XmlWriterSettings();
                        sett.Encoding = Encoding.UTF8;
                        sett.Indent = true;

                        using (Stream outs =
                            wsp.GetStream(FileMode.Open, FileAccess.ReadWrite)
                            )
                        {
                            using (XmlWriter xwr = XmlWriter.Create(outs, sett))
                            {
                                ele.Save(xwr);
                            }
                        }

                        Console.WriteLine(ele.ToString());
                    }

    • Edited by Pohwan Wednesday, August 12, 2009 10:49 AM
    Wednesday, August 12, 2009 10:35 AM