none
Found for fastest, most efficient way to handle large Excel files. Need to make it even faster.

    Question

  • Hello:

    This is a follow up to the following thread:

      https://social.msdn.microsoft.com/Forums/en-US/4418a7ae-3269-4697-86e1-1cc9250f94a6/looking-for-fastest-most-efficient-way-to-handle-large-excel-files?forum=vsto

    I found this code segment which is the answer to my prayers:

    // The SAX approach.
    static void ReadExcelFileSAX(string fileName)
    {
        using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
        {
            WorkbookPart  workbookPart  = spreadsheetDocument.WorkbookPart;
            WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
    
            OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
            string text;
            while (reader.Read())
            {
                if (reader.ElementType == typeof(CellValue))
                {
                    text = reader.GetText();
                    Console.Write(text + " ");
                }
            }
            Console.WriteLine();
            Console.ReadKey();
        }
    }

       https://docs.microsoft.com/en-us/office/open-xml/how-to-parse-and-read-a-large-spreadsheet

    That is exactly what is needed to traverse a spreadsheet, row by row. My only question is this: the code above seems to retrieve the values from all cells? Regardless of location? Needless to say, we developers use 2-dimensional indexing.

    How to convert that optimized code to retrieval by worksheet name and coordinates? Preferably numerical.

    TIA





    jeudi 12 juillet 2018 16:29

Toutes les réponses

  • Hello Travis Banger,

    I tried to test the code shared and I think it does not work to return cell value for me. It could successfully to return value in number format. However, for a cell with text value, it could only return the text's sharedstring value rather the text.

    Besides, it seems that you are try to get a value from specific cell. So I think there is no need to get all the cell values. T think you could try to use the 2-dimensional to get the value directly. Does you do have the need to read all cell values?

    Best Regards,

    Terry


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    vendredi 13 juillet 2018 07:19
    Modérateur
  • Terry:

    My concern is that at that spreadsheet size, SAX must be used. We know that the code provided above (found in a Microsoft website) is based on the recommended SAX approach. However, I have no idea whether my current code -posted below- uses SAX or not. Perhaps it is based on DOM? I would like to make sure that SAX is the engine. IOW: My goal is to merge:

     (a) The speed, efficiency of SAX (above)

     (b) The convenience of indexing provided by the code below.

    Thanks.

    private string GetCellValue(WorkbookPart wbPart, WorksheetPart wsPart, string coordinates)
    {
        string value = null;
    
        Cell theCell = wsPart.Worksheet.Descendants<Cell>().Where(c => c.CellReference == coordinates).FirstOrDefault();
    
        if (theCell != null)
        {
            value = theCell.InnerText;
    
            if (theCell.DataType != null)
            {
                switch (theCell.DataType.Value)
                {
                    case CellValues.SharedString :
                        SharedStringTablePart stringTable = wbPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
                        if (stringTable != null)
                        {
                            value = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
                        }
                    break;
    
                    case CellValues.Boolean:
                        switch (value)
                        {
                            case "0":
                                value = "False";
                                break;
                            default:
                                value = "True";
                                break;
                        }
                    break;
                }
            }
        }
        return value;
    }


    vendredi 13 juillet 2018 07:29
  • Incidentally, this has become a question for the OpenXML forum. Are you able to move it? Or, I can repost it there.

    vendredi 13 juillet 2018 07:32
  • Hello Travis Banger,

    Could GetCellValue return cell value for specific coordinates? Will it spent much time that you could not accept? I did not see any necessary to use the SAX code. 

    What are you doing now? What makes you think it is slow to read cell values?

    Best Regards,

    Terry


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    vendredi 13 juillet 2018 07:46
    Modérateur
  • Hello Travis Banger,

    Could GetCellValue return cell value for specific coordinates? Will it spent much time that you could not accept? I did not see any necessary to use the SAX code. 

    What are you doing now? What makes you think it is slow to read cell values?

    Terry:

    The GetCellValue approach is my current one. It is working fine, but since my spreadsheets are huge, and our computers underpowered, I was trying to see whether using SAX "under the hood" would make my application run faster.

    However, this has become a moot point. Instead of having 2 copies of the Excel file in RAM (one managed by my Excel Addin, the other opened read-only by my external WPF app) I am now placing a very short message in shared memory. It contains the results of the spreadsheet's inspection, to be subsequently displayed by a DataGrid.

    That way I am having my proverbial cake and eating it, too. I am leveraging:

    (a) The access to the live data (cell values) provided by interop-based Addins.

    (b) The superior look and feel provided by WPF (as opposed to inferior Windows Forms).

    My problem has now become: Why is the reading application failing to cast the received message, after it is deserialized? I can see it with the debugger and it looks perfect. All fields are present. That will be the subject of my next post, in an appropriate forum.

    Thanks again!



    samedi 14 juillet 2018 14:30
  • For those interested in this saga, the followup is here:

    https://social.msdn.microsoft.com/Forums/en-US/a21a9297-4747-4c7c-a7b2-4087a6bd4aaa/why-is-my-cast-crashing-according-to-the-vs-debugger-the-object-seems-to-be-in-perfect-shape?forum=csharpgeneral

    dimanche 15 juillet 2018 00:21
  • Hello Travis Banger,

    I'm a little wondering what you want to do now. If you want to get some cell value from a workbook, I think GetCellValue should be enough to work. I think the workbook size should have little effects on the speed to read a cell unless you have a lot of tasks to call the GetCellValue much times.

    Besides, I'm wondering why you need open the file twice. This work should also be able to be finished by Excel AddIns.

    Best Regards,

    Terry


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    lundi 16 juillet 2018 09:53
    Modérateur