none
Read a very large Excel file using SAX parser RRS feed

  • Question

  • Hi All,

    I have a very large Excel file (Column A - By) anf around 1 Lac rows.

    My objective is to read this is Asp.net in a fastest way without impacting the performance.

    For that, I created a sample console app and used SAX based approach and I was able to read pretty fast.

    Problem is that, when I added the code in my asp.net, it is taking ages to even read the values.

    I would be interested to know: in the fastest way to read an Excel file and convert the data into XML format from an asp.net app. Any readymade sample would really help.

    Here is my code: (Please note that the following code works from console app  great but performance really degrades when moved to asp.net app)

    using

    System;


    using

    System.Collections.Generic;


    using

    System.Linq;


    using

    System.Web;


    using

    System.Web.UI;


    using

    System.Web.UI.WebControls;


    using

    DocumentFormat.OpenXml;


    using

    DocumentFormat.OpenXml.Packaging;


    using

    OO=DocumentFormat.OpenXml.Spreadsheet;


    using

    System.Diagnostics;



    namespace

    WebApplication1

    {

       

    publicpartialclass_Default: Page


        {

           

    protectedvoidPage_Load(objectsender, EventArgse)

            {

               

    Stopwatchwatch = newStopwatch();

                watch.Start();

               

    StringfileName = @"C:\Users\ankushb\Desktop\TestData_Eligibility_File.xlsx";

               

    // Comment one of the fo    llowing lines to test the method separately.


               

    //ReadExcelFileDOM(fileName);    // DOM


                ReadExcelFileSAX1(fileName);   

    // SAX


                watch.Stop();

            }

           

    publicvoidReadExcelFileSAX1(stringfileName)

            {

               

    List<string> listShared = newList<string>();

               

    using(SpreadsheetDocumentxl = SpreadsheetDocument.Open(fileName, false))

                {

                    OO.

    SharedStringItemssi;

                   

    using(OpenXmlReaderoxrShared = OpenXmlReader.Create(xl.WorkbookPart.SharedStringTablePart))

                    {

                       

    while(oxrShared.Read())

                        {

                           

    if(oxrShared.ElementType == typeof(OO.SharedStringItem))

                            {

                                ssi = (OO.

    SharedStringItem)oxrShared.LoadCurrentElement();

                               

    // this assumes the shared string is a simple text format, instead of rich text.


                                listShared.Add(ssi.Text.Text);

                            }

                        }

                    }

                   

    WorksheetPartwsp = xl.WorkbookPart.WorksheetParts.First();

                    OO.

    Cellc;

                   

    using(OpenXmlReaderoxrCells = OpenXmlReader.Create(wsp))

                    {

                       

    while(oxrCells.Read())

                        {

                           

    if(oxrCells.ElementType == typeof(OO.Cell))

                            {

                                c = (OO.

    Cell)oxrCells.LoadCurrentElement();

                             

    //  c = (OO.Cell)oxrCells.LoadCurrentElement();


                               

    // c.CellReference holds a string such as "A1"


                               

    if(c.DataType != null)

                                {

                                   

    if(c.DataType == OO.CellValues.SharedString)

                                    {

                                       

    // use whichever from-string-to-number conversion


                                       

    // you like.


                                       

    Console.Write(listShared[Convert.ToInt32(c.CellValue.Text)]);

                                       

    //listShared[Convert.ToInt32(c.CellValue.Text)];


                                    }

                                   

    elseif(c.DataType == OO.CellValues.Number)

                                    {

                                       

    // "normal" value


                                       

    //c.CellValue.Text;


                                    }

                                   

    // there's also boolean, which you might be interested


                                   

    // as well as other types


                                }

                               

    else


                                {

                                   

    // is by default a Number. Use this:


                                   

    //c.CellValue.Text;


                                }

                            }

                        }

                    }

                }

            }

        }

    }

    Tuesday, February 25, 2014 7:29 PM

All replies

  • Hi Ankush

    It's an interesting question, but I fear you may not get much help here as this forum specifically supports the Open XML SDK tool, so there's not much experience for discussing the SAX approach. You may have better luck at OpenXMLDeveloper.org on this. Or perhaps a forum where SAX, generally, is discussed as this would appear to be an issue with Sax + ASP.NET?

    Beyond that, if the Excel data is in a table format you could look into using a database connection method (OLE DB) to work with the content of the workbook. I have no idea how fast that is compared to XML. Nor am I conversant enough with ASP.NET to know whether performance would be better, generally.


    Cindy Meister, VSTO/Word MVP, my blog

    Wednesday, February 26, 2014 7:20 AM
    Moderator
  • Hi Ankush,

    >> Problem is that, when I added the code in my asp.net, it is taking ages to even read the values <<

    I want to confirm that the performance issue is caused by IO or computing.

    As I see from the code you post, the excel file is located in local disk. Since you moved the code to an ASP.NET application, do you have grant sufficient permission to ASP.NET pool to read that file?

    If the performance issue is caused by IO, I think you could also consider to post a question to ASP.NET forum for more efficient response.


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Wednesday, February 26, 2014 8:45 AM
    Moderator