none
How to optimize fetching of data from a Excel file which has more than 15k rows and only two columns. RRS feed

  • Question

  • Hi,

    I am trying to create a dictionary from a Excel file with column 1 as Key and column 2 as Value.

    I have used Open XML format,DOM method  to achieve the same but it's taking too much around 5 mins for creating a dictionary of length 15k.

    Here is the code for the same.

    private void refreshDictionaryOpenXML()
            {
                lblRefreshMessage.Text="Refreshing Dictionary,Please Wait...";
                btnTranslate.Enabled = false;
                
                string strDoc = Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + "\\dict1.xlsx";
               
                using (SpreadsheetDocument spreadsheetDocument =
                            SpreadsheetDocument.Open(strDoc, false))
                {
                    WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
                    WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
                    
                    SheetData sheetData = worksheetPart.Worksheet.GetFirstChild<SheetData>();
                    int str;
                    int keyVal;
                    
                    try
                    {
                       
                        foreach (Row r in sheetData.Elements<Row>())
                        {
                            int colNo = 1;
                            string strString="";
                            string keyString = "";
                            foreach (Cell c in r.Elements<Cell>())
                            {
                                if (colNo==1)
                                {
                                    if (c.CellValue != null && Int32.TryParse(c.InnerText, out str))
                                    {
                                    }
                                    else
                                    {
                                        str = -1;
    
                                    }
    
                                    if (str != -1)
                                    {
                                        
                                        strString = GetSharedStringItemById(workbookPart, str);
                                        byte[] bytes = Encoding.Default.GetBytes(strString);
                                        byte[] devBytes = Encoding.Convert(Encoding.Default, encoding, bytes);
                                        strString = encoding.GetString(devBytes);
    
                                    }
                                    
                                }
                                else if(colNo==2)
                                {
                                    if (c.CellValue != null && Int32.TryParse(c.InnerText, out keyVal))
                                    {
                                    }
                                    else
                                    {
                                        keyVal = -1;
    
                                    }
    
                                    if (keyVal != -1)
                                    {
                                        keyString = GetSharedStringItemById(workbookPart, keyVal);
                                        
                                    }
                                }
                                colNo += 1;
                            }
                            
                            if (!translatorDictionary.ContainsKey(strString))
                            {
                                translatorDictionary.Add(strString, keyString);
                                
                            }
    
                        }
                    }
                    catch(Exception ex)
                    {
                        MessageBox.Show(ex.Message);
                    }
                }
                lblRefreshMessage.Text = "Dictionary Refreshed";
                btnTranslate.Enabled = true;
            }

    Please let me know how can this be optimized more for faster operation.


    Point5Nyble

    Friday, February 26, 2016 6:16 AM

Answers

  • Hi Point5Nyble,

    Based on your code, it seems you do not need loop while colNo is larger than 3, so I think try something like below:

        if (colNo >= 3)
                                {
                                    break;
                                }
                                else if (colNo == 1)

    In addition, if you do not mind use ADO.net to get data from excel, I suggest you refer the link below:

    # How To Use ADO.NET to Retrieve and Modify Records in an Excel Workbook With Visual Basic .NET
    https://support.microsoft.com/en-us/kb/316934

    Best Regards,

    Edward


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    Monday, February 29, 2016 6:37 AM
  • Before you switch, you could try changing from For Each cell in row, to explicitly referencing only columns 1 and 2. Even though only 2 columns have data, you may be looping over a very large number of columns. At least you can check the row.Elements<cell>.Count.

    Paul

    Thursday, March 3, 2016 11:58 AM

All replies

  • Hi Point5Nyble,

    Based on your code, it seems you do not need loop while colNo is larger than 3, so I think try something like below:

        if (colNo >= 3)
                                {
                                    break;
                                }
                                else if (colNo == 1)

    In addition, if you do not mind use ADO.net to get data from excel, I suggest you refer the link below:

    # How To Use ADO.NET to Retrieve and Modify Records in an Excel Workbook With Visual Basic .NET
    https://support.microsoft.com/en-us/kb/316934

    Best Regards,

    Edward


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    Monday, February 29, 2016 6:37 AM
  • I have added check on column,but no significant improvement..

    I guess ,I will have to use ADO.NET approach now since I need it to be really fast..

    Thanks


    Point5Nyble

    Wednesday, March 2, 2016 7:14 AM
  • Before you switch, you could try changing from For Each cell in row, to explicitly referencing only columns 1 and 2. Even though only 2 columns have data, you may be looping over a very large number of columns. At least you can check the row.Elements<cell>.Count.

    Paul

    Thursday, March 3, 2016 11:58 AM