none
Bad Performance of Open XML while reading .xlsx files RRS feed

  • Question

  • Hi Folks,

    We are planning to use Open XML SDK to read and parse through .xlsx files but we are getting bad results as far as performance is concerned.

    For a 17 MB file having 346K records, it is taking around 1.5 hours to complete.

    The following is the code:

    class Program
        {
            static void Main(string[] args)
            {
                var finalWriter = new StreamWriter(@"D:\resultfinal_analysis_large_1.txt", true);
                var outputFolder = @"D:\outputfilemedium";
                var directory = new DirectoryInfo(@"D:\CheckFiles5");
                foreach (var fileInfo in directory.GetFiles())
                {
                    long rowCount = 0;
                    long actualRowCount = 0;
                    if (!(fileInfo.Extension == ".xlsx"))
                    {
                        continue;
                    }
                    //@"D:\DemoExcel.xlsx"
                    var fileSize = fileInfo.Length;
                    Stopwatch sw = null;
                    try
                    {
                        //var fileName = Path.GetFileNameWithoutExtension(fileInfo.Name);
                        //var path = Path.Combine(outputFolder, string.Concat(fileName, ".csv"));
                        //var writer = new StreamWriter(path, true);
                        sw = new Stopwatch();
                        sw.Start();
                        using (var fileStream = new FileStream(fileInfo.FullName, FileMode.Open, FileAccess.Read))
                        {
                            using (var spreadSheetDocument = SpreadsheetDocument.Open(fileStream, isEditable: false))
                            {
                                var workBookPart = spreadSheetDocument.WorkbookPart;
                                var sharedStringTable = workBookPart.SharedStringTablePart.SharedStringTable;
                                foreach (var workSheetPart in workBookPart.WorksheetParts)
                                {
                                    if (workSheetPart.Worksheet == null)
                                    {
                                        continue;
                                    }
                                    if (workSheetPart.Worksheet != null &&
                                        string.IsNullOrEmpty(workSheetPart.Worksheet.InnerText))
                                    {
                                        continue;
                                    }
                                    using (var reader = OpenXmlReader.Create(workSheetPart))
                                    {
                                        var str = new StringBuilder();
                                        while (reader.Read())
                                        {
                                            if (reader.ElementType == typeof (Row))
                                            {
                                                actualRowCount++;
                                                reader.ReadFirstChild();
                                                do
                                                {
                                                    Cell cell = reader.LoadCurrentElement() as Cell;
                                                    if (cell != null)
                                                    {
                                                        var dataType = cell.DataType;
                                                        if (dataType == null)
                                                        {
                                                            var cellValue = cell.CellValue;
                                                            if (cellValue != null)
                                                            {
                                                               str.AppendFormat("{0},",cellValue.InnerText);
                                                            }
                                                            continue;
                                                        }
                                                        if (dataType == CellValues.SharedString)
                                                        {
                                                            var index = int.Parse(cell.InnerText);
                                                            str.AppendFormat("{0},", sharedStringTable.ChildElements[index].InnerText);
                                                        }
                                                        else
                                                        {
                                                            str.AppendFormat("{0},", cell.InnerText);
                                                        }
                                                    }
                                                } while (reader.ReadNextSibling());
                                            }
                                            //writer.WriteLine(str);
                                            //writer.Flush();
                                            str.Clear();
                                        }
                                    }
                                }
                            }
                        }
                        sw.Stop();
                        finalWriter.WriteLine("{0},{1},{2},{3}", actualRowCount, sw.ElapsedMilliseconds, fileSize, fileInfo.Name);
                        finalWriter.Flush();
                        //writer.Close();
                        sw = null;
                    }
                    catch (Exception ex)
                    {
                        sw.Stop();
                        sw.Reset();
                        sw = null;
                        //finalWriter.WriteLine("File is corrupted! ex: {0}, filename:{1}", ex, fileInfo.FullName);
                        Console.WriteLine("File is corrupted! ex:{0}", ex);
                    }
                }
               
                finalWriter.Close();
                Console.WriteLine("Completed!");
                Console.ReadLine();
            }
        }

    Karan Vohra

    Tuesday, August 23, 2016 11:38 PM

Answers

All replies

  • >>>We are planning to use Open XML SDK to read and parse through .xlsx files but we are getting bad results as far as performance is concerned.

    According to your description, if you want to retrieve and modify data in an Excel, you also could consider to use ADO.Net.

    For more information, click here to How To Use ADO.NET to Retrieve and Modify Records in an Excel Workbook With Visual Basic .NET

    Wednesday, August 24, 2016 2:44 AM
  • Hi!

    Thanks for the response. We want to move away from ADO.Net. We currently have implementing the solution using ADO.net only but there are many issues with it:

    1. It truncates the data. If the data in first 8 columns have 255 characters and 9th column have more than that, it truncates the value of 9th column to 255 characters.

    2. It does not read the .xlsx files which are downloaded from internet and locked. The users have to save as their file which is a huge pain.

    3. No good support to navigate through different work-sheets.

    4. We have to install this component on our production machines. We plan to move to Azure and we want to get rid of the step to install this component before getting a role working.

    I was hoping you guys can help us out in giving some suggestions on how we can improve performance here. Is Open XML SDK not mature enough? Not supported? not performant enough?


    Karan Vohra

    Wednesday, August 24, 2016 2:59 AM
  • >>>I was hoping you guys can help us out in giving some suggestions on how we can improve performance here. Is Open XML SDK not mature enough? Not supported? not performant enough?

    According to your description, you could refet to How to: Parse and read a large spreadsheet document (Open XML SDK)

    Thanks for your understanding.
    • Proposed as answer by David_JunFeng Friday, September 2, 2016 8:21 AM
    • Marked as answer by David_JunFeng Monday, September 5, 2016 5:28 AM
    Thursday, August 25, 2016 7:47 AM