none
xlsx Faster processing with progress bar

    Question

  • Hey All,

    I have an extremely large .xlsx worksheet. As in 311,000 rows and 33 columns per row. I am trying to import this data into a datatable. I have tried various methods and have one that works but is extremely slow. I have tried using interop filters and special cells to filter said data but the import on interop is very slow. I have also tried using OLE as well. However I am unaware of a method to track the progress of the fill using a progress bar or filter the data before being imported. For example if I filter the .xlsx workbook by say "1072" in column 21 then it brings the total row count from 311,000 to say 53,000 rows. Much smaller amount to work with. Code below.

    INTEROP METHOD.

       For i As Integer = 2 To counta
                    If CStr(xlWorkSheet2.Cells(i, 21).value) = Form3.Label1.Text Then
                        t1 = CStr(xlWorkSheet2.Cells(i, 1).value)
                        t2 = CStr(xlWorkSheet2.Cells(i, 2).value)
                        t3 = CStr(xlWorkSheet2.Cells(i, 3).value)
                        t4 = CStr(xlWorkSheet2.Cells(i, 4).value)
                        t5 = CStr(xlWorkSheet2.Cells(i, 5).value)
                        t6 = CStr(xlWorkSheet2.Cells(i, 6).value)
                        t7 = CStr(xlWorkSheet2.Cells(i, 7).value)
                        t8 = CStr(xlWorkSheet2.Cells(i, 8).value)
                        t9 = CStr(xlWorkSheet2.Cells(i, 9).value)
                        t10 = CStr(xlWorkSheet2.Cells(i, 10).value)
                        t11 = CStr(xlWorkSheet2.Cells(i, 11).value)
                        t12 = CStr(xlWorkSheet2.Cells(i, 12).value)
                        t13 = CStr(xlWorkSheet2.Cells(i, 13).value)
                        t14 = CStr(xlWorkSheet2.Cells(i, 14).value)
                        t15 = CStr(xlWorkSheet2.Cells(i, 15).value)
                        t16 = CStr(xlWorkSheet2.Cells(i, 16).value)
                        t17 = CStr(xlWorkSheet2.Cells(i, 17).value)
                        t18 = CStr(xlWorkSheet2.Cells(i, 18).value)
                        t19 = CStr(xlWorkSheet2.Cells(i, 19).value)
                        t20 = CStr(xlWorkSheet2.Cells(i, 20).value)
                        t21 = CStr(xlWorkSheet2.Cells(i, 21).value)
                        t22 = CStr(xlWorkSheet2.Cells(i, 22).value)
                        t23 = CStr(xlWorkSheet2.Cells(i, 23).value)
                        t24 = CStr(xlWorkSheet2.Cells(i, 24).value)
                        t25 = CStr(xlWorkSheet2.Cells(i, 25).value)
                        t26 = CStr(xlWorkSheet2.Cells(i, 26).value)
                        t27 = CStr(xlWorkSheet2.Cells(i, 27).value)
                        t28 = CStr(xlWorkSheet2.Cells(i, 28).value)
                        t29 = CStr(xlWorkSheet2.Cells(i, 29).value)
                        t30 = CStr(xlWorkSheet2.Cells(i, 30).value)
                        t31 = CStr(xlWorkSheet2.Cells(i, 31).value)
                        t32 = CStr(xlWorkSheet2.Cells(i, 32).value)
                        t33 = CStr(xlWorkSheet2.Cells(i, 33).value)
                        TextBox2.Text = "Row " & i
                        RAW.Rows.Add(t1, t2, t3, t4, t5, t6, t7, t8, t9, t10, t11, t12, t13, t14, t15, t16, t17, t18, t19, t20, t21, t22, t23, t24, t25, t26, t27, t28, t29, t30, t31, t32, t33)
                        ProgressBar1.Value = ProgressBar1.Value + 1
                    Else
                        ProgressBar1.Value = ProgressBar1.Value + 1
                    End If
                Next

    OLE METHOD (FASTEST BUT NO FILTER OR PROGRESS)

     Public Sub OXER()
            Dim MyConnection As System.Data.OleDb.OleDbConnection
            Dim RAW As New DataTable
            Dim MyCommand As System.Data.OleDb.OleDbDataAdapter
            MyConnection = New System.Data.OleDb.OleDbConnection("provider=Microsoft.ACE.OLEDB.12.0;Data Source='" & IO.Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "2018RAW.xlsx") & "';Extended Properties=Excel 12.0; HDR=YES")
            MyCommand = New System.Data.OleDb.OleDbDataAdapter("select * from [2018RAW$]", MyConnection)
            MyCommand.Fill(RAW)
            MyConnection.Close()
            Form3.DataGridView1.DataSource = RAW
        End Sub

    Any help would be greatly appreciated!

    Saturday, January 12, 2019 9:44 PM

Answers

  • There is a known way to increase the performance: reading a range of values instead of separate cells.

    The next example reads the first 10 000 rows:

       Dim top_left = xlWorkSheet2.Cells(2, 1)

       Dim bottom_right = xlWorkSheet2.Cells(10_001, 33)

       Dim range = xlWorkSheet2.Range(top_left, bottom_right)

       Dim values As Object(,) = CType(range.Value2, Object(,))

     

       Dim t(32) As Object

       For row = 1 To 10_000

          For col = 1 To 33

             t(col - 1) = values(row, col)

          Next

          RAW.Rows.Add(t)

       Next

     

    This values is a matrix that consists of 33 columns and 10 000 rows, containing the worksheet’s values.

    You can try adjusting some of the above constants to read all of 311 000 rows. But if this is not allowed due to memory limitations, you can repeat the operation for groups of rows (such as 10 000 rows or more), including the last incomplete group. Also add the filter (‘If CStr(values(row, 21) = …’) after the first For). The progress bar can be updated before jumping to next group.

    • Edited by Viorel_MVP Sunday, January 13, 2019 9:10 AM
    • Marked as answer by k7s41gx Sunday, January 13, 2019 1:46 PM
    Sunday, January 13, 2019 9:08 AM

All replies

  • Hi

    Shifting huge amounts of data is inherently time consuming regardless of methods used.

    The one thing that could help ease the whole process is to make the application remain responsive to the User - ie other tasks could be performed during data moving. This could be accomplished by using a BackGroundWorker and its associated events. A ProgressBar could be shown in the UI in such a setup too.

    There may be some time saving to be had with certain methods, but I would imagine that a bit of testing would be needed there.

    *

    I dont know if doing this would be amy quicker, but I have an idea it *may* be.

    Export the data from the Excel application (macro perhaps) and Import to the VB application directly to a DataTable.

    If any of that looks like it could help, then an example for a BGW could be provided (I don't use Excel so can't show that side of things)


    Regards Les, Livingston, Scotland

    Saturday, January 12, 2019 10:11 PM
  • Hello,

    I'm going to address the incorrect method first which is to attempt reading that many records and columns from Excel into a DataGridView. If you were working with a database table, this is completely wrong but depending on what the intent is after the data is load can change things.

    Addressing BackGroundWorker components or asynchronous methods to read in this much data will only keep your user interface responsive and will slow down read in data from either Excel or a database table.

    The correct way to handle this (and you may not care for this method) were some of the parts are dependent on your requirements and resources.

    • Create a database e.g. SQL-Server Express edition (it's free) with a table (or more if needed) to receive the data from Excel.
    • Download SSMS (SQL-Server Management Studio), import Excel via the import process (it's a wizard).

    Once data resides in SQL-Server you can load the data into a Windows forms application using paging. Paging loads a subset of the data so it's quick (see a simple example)

    Alternately MS-Access can be used but I would recommend to avoid MS-Access with that may rows and columns.

    Challenges: Although paging is quick it creates other task for search an filtering more complex.

    In closing, if we explored various methods to read that much data Open XML would be an option but that is far from simple, OleDb there can be memory issues and there is no way to show progress, using Excel automation would be a last ditch method as it's slower than any other method.

    I've worked with larger WorkSheets than this and my solution was to import into SQL-Server in an automated process during the evening hours.


    Please remember to mark the replies as answers if they help and unmark them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.
    VB Forums - moderator
    profile for Karen Payne on Stack Exchange, a network of free, community-driven Q&A sites

    Sunday, January 13, 2019 1:23 AM
    Moderator
  • There is a known way to increase the performance: reading a range of values instead of separate cells.

    The next example reads the first 10 000 rows:

       Dim top_left = xlWorkSheet2.Cells(2, 1)

       Dim bottom_right = xlWorkSheet2.Cells(10_001, 33)

       Dim range = xlWorkSheet2.Range(top_left, bottom_right)

       Dim values As Object(,) = CType(range.Value2, Object(,))

     

       Dim t(32) As Object

       For row = 1 To 10_000

          For col = 1 To 33

             t(col - 1) = values(row, col)

          Next

          RAW.Rows.Add(t)

       Next

     

    This values is a matrix that consists of 33 columns and 10 000 rows, containing the worksheet’s values.

    You can try adjusting some of the above constants to read all of 311 000 rows. But if this is not allowed due to memory limitations, you can repeat the operation for groups of rows (such as 10 000 rows or more), including the last incomplete group. Also add the filter (‘If CStr(values(row, 21) = …’) after the first For). The progress bar can be updated before jumping to next group.

    • Edited by Viorel_MVP Sunday, January 13, 2019 9:10 AM
    • Marked as answer by k7s41gx Sunday, January 13, 2019 1:46 PM
    Sunday, January 13, 2019 9:08 AM
  • Thanks Viorel!

    Your solution once adapted is significantly faster. I think this could be made even faster by filtering and then importing just the filtered rows using .auto filter.

    Thanks again!

    • Edited by k7s41gx Sunday, January 13, 2019 1:46 PM update
    Sunday, January 13, 2019 1:06 PM