checking imported data one character at a time RRS feed

  • Question

  • User-750935311 posted

    My application allows users to import csv data through an import data wizard using a file upload control.  After the file upload process, I read the file using the code below one character at a time and pluck out any unwanted characters.  If there are 50 or less records it works just fine.  But if there are more the server times out.

    This is working for small recordsets:

    ' strip off any characters that are not accepted
    Dim strNew As String = ""
    Using sr As New StreamReader(strSavePath)
        While sr.Peek() >= 0
        strNew += Regex.Replace(ChrW(sr.Read()), "(?![a-zA-Z0-9 '/@#,.-]).*", "")
        End While
    End Using

    For 3,000 records this takes 3min40sec and the server times out.  How can I speed this up?

    Wednesday, February 12, 2014 5:17 PM


  • User-750935311 posted

    Ok, figured it out.  The cause of the of the server timeout ended up being the fact that I was iterating through a file of 3,000 records one character at a time and concaenating a string one character at a time.  That probably equates to hunderds of thousands of concatenations.

    I fixed the problem by converting the csv file to a datatable and iterating through each cell of each row and handling the validation one cell value at a time.  This cut my total rendering time from 3:40 to 0:05.

    Here is how I handled the conversion of the csv file to a datatable:

        Private Function ProcessCSVFile(ByVal inputPath As String) As DataTable
            Dim dtresult As DataTable = Nothing
            Dim objTextparser As Microsoft.VisualBasic.FileIO.TextFieldParser = Nothing
                Console.WriteLine(" In ProcessCSVfile " & DateTime.Now)
                objTextparser = New TextFieldParser(inputPath)
                objTextparser.TextFieldType = FieldType.Delimited
                'convert to Data table
                dtresult = New DataTable()
                Dim stringRow As String() = objTextparser.ReadFields()
                Dim intCnt As Integer = 0
                For intCnt = 0 To stringRow.Length - 1
                    dtresult.Columns.Add(stringRow(intCnt), Type.[GetType]("System.String"))
                'populate with data:
                While Not objTextparser.EndOfData
                    stringRow = objTextparser.ReadFields()
                End While
                Console.WriteLine(" Exiting ProcessCSVfile " & DateTime.Now)
            Catch ex_process_csv As Exception
                dtresult = Nothing
                If Not IsNothing(objTextparser) Then
                    objTextparser = Nothing
                End If
            End Try
            Return dtresult
        End Function

    That function takes the file path and fine name: strSavePath = ImportUploadPath + strFileName

    Here is how I looped through each cell value:

                    Dim dt As DataTable = New DataTable
                    dt = ProcessCSVFile(strSavePath)
                    Response.Write(dt.Rows.Count.ToString() & "<br />")
                    Dim o As String = ""
                    Dim n As String = ""
                    For Each row As DataRow In dt.Rows
                        For Each col As DataColumn In dt.Columns
                            n = ""
                            o = row(col.ColumnName).ToString() 'gets the 'cell' value
                            For Each c As Char In o
                                n += Regex.Replace(c, "(?![a-zA-Z0-9 '/@#,.-]).*", "") 'the regular expression defines the acceptable characters
                            If col.DataType = System.Type.GetType("System.String") Then
                                row(col.ColumnName) = n.ToString
                            End If
                            If col.DataType = System.Type.GetType("System.Int32") Then
                                row(col.ColumnName) = CInt(n)
                            End If
                            If col.DataType = System.Type.GetType("System.Boolean") Then
                                row(col.ColumnName) = CBool(n)
                            End If
                            If col.DataType = System.Type.GetType("System.DateTime") Then
                                row(col.ColumnName) = CDate(n)
                            End If
                    GridView1.DataSource = dt
                    GridView1.PageSize = lstRecordPerPage.SelectedValue

    I discovered after doing my coding that the data being imported is all strings, so the code may not handle some of the other datatypes.

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, February 12, 2014 7:17 PM