none
c# Filter Excel using OpenXML RRS feed

  • Question

  • Hello,

    I am writing a c# WPF application that reads an Excel sheet with 900,000 + rows and more added daily. I am only interested in the bottom few hundred rows which correspond to today's date (Date is stored in column A).

    I know with a sheet this size that a database would be preferable, but the Client is resisting this, nor do they want to provide a workbook with only today's values in it, so I have to work with what I get.

    Currently I am reading the entire sheet in to a DataTable, which not only takes a long time, but is heavy on resources.is there a way, using OpenXML, to only read the data for today?

    Thanks in advance

    Arthur

    Friday, July 7, 2017 3:19 PM

All replies

  • Hello Arthur,

    Sorry for the delay.

    To parse large Excel file, you could visit 

    How to: Parse and read a large spreadsheet document (Open XML SDK)

    Parsing and Reading Large Excel Files with the Open XML SDK

    According to your description, you only want to get data at the bottom few hundred rows. If the row count is fixed or the row index is regular for every day, you could use similar code like 

       static void readExcelDataDOM(string filename)
            {
                using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open(filename, false))
                {
                    WorkbookPart workbookPart = myDoc.WorkbookPart;
                    WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
                    SheetData sheetData =
                    worksheetPart.Worksheet.Elements<SheetData>().First();
                    Row r = sheetData.Elements<Row>().ElementAt(1);
                    //foreach (Row r in sheetData.Elements<Row>())
                    //{
                        foreach (Cell c in r.Elements<Cell>())
                        {
                            string text = c.CellValue.Text;
                            Console.WriteLine(text);
                        }
                    //}
                }
            }
    Please notice that different datatypes of cell values need different methods. You could visit 

    How to: Retrieve the values of cells in a spreadsheet document (Open XML SDK)

    // If the cell represents an integer number, you are done. 
                // For dates, this code returns the serialized value that 
                // represents the date. The code handles strings and 
                // Booleans individually. For shared strings, the code 
                // looks up the corresponding value in the shared string 
                // table. For Booleans, the code converts the value into 
                // the words TRUE or FALSE.
                if (theCell.DataType != null)
                {
                    switch (theCell.DataType.Value)
                    {
                        case CellValues.SharedString:
                            
                            // For shared strings, look up the value in the
                            // shared strings table.
                            var stringTable = 
                                wbPart.GetPartsOfType<SharedStringTablePart>()
                                .FirstOrDefault();
                            
                            // If the shared string table is missing, something 
                            // is wrong. Return the index that is in
                            // the cell. Otherwise, look up the correct text in 
                            // the table.
                            if (stringTable != null)
                            {
                                value = 
                                    stringTable.SharedStringTable
                                    .ElementAt(int.Parse(value)).InnerText;
                            }
                            break;
    
                        case CellValues.Boolean:
                            switch (value)
                            {
                                case "0":
                                    value = "FALSE";
                                    break;
                                default:
                                    value = "TRUE";
                                    break;
                            }
                            break;
                    }

    For a date like 2017/7/18, the cell value would 42934 as an integer number stored in cell. 

    For some strings like "test", the cell value would be an integer number which represents the index of SharedStringTable. So to get cell value which datatype is sharedstring, you need to get the index and find it in the SharedStringTable.

    If your sheet is large, i think it may take long time as DataTable.

    Regards,

    Celeste


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, July 18, 2017 5:35 AM
    Moderator