locked
TextFieldParser problem with Double Quotes with Quotes RRS feed

  • General discussion

  • Hi, I'm basically trying to import a CSV into an ACCESS database.  Sample date is shown below:

     

    "",10173,"Development Manager - Social Economy Sector","Trust Bank",10153,,"Lolalll Pudd","Meet the requirements of Structured Finance & Relationship Teams for "transaction support" at all times, ensuring appropriate analysis and clarity in respect of required/request","",09-Sep-2008,42000,38000,"+ LTA + bonus",,,,41000,45000,,,"","",,"","","",,,"",,"A","AUK,AUL,AUN,CBE,RBD"

     

    The CSV data is malformed, as each field requires double quotes enclosed around them.  Unfortunately, one of the fields contain double quotes within.  So I'm trying to use the TextFieldParser to read the file and replace the double quotes with single quotes for that particular instance.

     

    If I set the HasFieldsEnclosedInQuotes to True, an error is thrown at the line.  If I use False the line is read, but the field data is spilt incorrectly.

     

    Using csvFileReader As New TextFieldParser(App_Path() & csvFileFullPath)

    csvFileReader.TextFieldType = FieldType.Delimited

    csvFileReader.Delimiters = New String() {","}

    'csvFileReader.HasFieldsEnclosedInQuotes = True

    csvFileReader.HasFieldsEnclosedInQuotes = False

    csvFileReader.TrimWhiteSpace = True

     

    Dim currentRow As String()

    While Not csvFileReader.EndOfData

    Try

    Dim i As Int32 = 1

    Dim outputRow As New Text.StringBuilder()

    currentRow = csvFileReader.ReadFields()

    For Each currentField As String In currentRow

        currentField = currentField.Replace(Chr(34), Chr(39)) 'replace double quote with single quote if needed

        OutputRow.Append(currentField)

        If i < currentRow.Length Then

            outputRow.Append(Chr(9)) 'add a tab for each field except last one

        End If

        i = i + 1

    Next

    Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException

        TextBox1.Text = ex.ToString

    End Try

    End While

     

    Please can someone help, as I'm sure this type of issue with importing CSV files is a problem for many developers.

     

     

     

    Thursday, October 30, 2008 11:53 AM

All replies

  • Trev,

     

    You might have better luck with an oledb connection and reading it with a datareader.  this example uses a dataadapter but you can modify it to use a datareader.  it will read the values as they are with single or double quotes and then you can replace them if needed.

     

    Dim dt As New DataTable

    Dim mySelectQuery As String = "SELECT * FROM test.csv"

    Dim myconnection As OleDb.OleDbConnection = New OleDb.OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & "c:\" & ";Extended Properties=""text;HDR=Yes;FMT=CSVDelimited""")

    Dim dsCmd As OleDb.OleDbDataAdapter = New OleDb.OleDbDataAdapter(mySelectQuery, myconnection)

    dsCmd.Fill(dt)

    Me.DataGridView1.DataSource = dt

    myconnection.Close()

     

     

    Jeff

    Friday, October 31, 2008 1:10 AM
  • Thanks for your reply Jeff.

    Unfortunately, I began my code by using the OleDb method.  This seemed to work well, until someone noticed not all information was being displayed for a particular record. 
    After further investigation I discovered the import was stopping once it reached a double quote within a line of data and disregarding the rest of the data. 

    I was hoping the Text Reader would read all the data and allow me to check for the double quotes.  The big problem is how to escape the double quotes once the data has read.


    Friday, October 31, 2008 10:06 AM
  • If you need to extract the information from the malformed CSV file in this scenario, one common way is that you need to use the regular expression. Of course, the CSV file should be descripted by one fixed pattern, then you can retrieve the required data as your expectation. Have a look at the regular expression, when you have the specific issues, feel free to tell us. Also post one small whole snippet about the CSV file in this scenario.
    Monday, November 3, 2008 7:33 AM
    Moderator
  • We are changing the issue type to “Comment” because you have not followed up. If you have more time to look at the issue and provide more information, please feel free to change the issue type back to “Question” by editing your initial post and changing the radio button at the top of the post editor window. If the issue is resolved, we will appreciate it if you can share the solution so that the answer can be found and used by other community members having similar questions. Thank you!

    Wednesday, November 5, 2008 4:10 AM
    Moderator
  • Hello!

    I'm having the same problem described above.
    I'm trying to read from a .csv file that has the following content:

    field 1;field 2;field 3;field 4;field 5
    field 1;field "2";field 3;field 4;field 5
    field 1;field 2;field 3;field 4;field 5

    Everything goes well except for the second line where the connection is unnable to continue beyond the first double quote, so my table ends up like this:

    +---------+---------+---------+---------+---------+
    | field 1 | field 2 | field 3 | field 4 | field 5 |
    +---------+---------+---------+---------+---------+
    | field 1 | field   |         |         |         |
    +---------+---------+---------+---------+---------+
    | field 1 | field 2 | field 3 | field 4 | field 5 |
    +---------+---------+---------+---------+---------+


    The code is the following:

    OleDbConnection connection = new OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0; Data Source = " + folder_text_box.Text + "; Extended Properties = \"Text;HDR=YES;FMT=CSVDelimited\"");

    connection.Open();

    OleDbDataAdapter
    adapter = new OleDbDataAdapter("SELECT * FROM " + file, connection);
    DataSet data_set = new DataSet("Temp");
    adapter.Fill(data_set);
    DataTable data_table = data_set.Tables[0];


    Is there any Extended Property that allows us to tell the connection to ignore the double quotes or to treat them as simple text.

    Thank you.

    Best regards,

    Jorge

    Thursday, December 11, 2008 6:53 PM
  • The columns look to be delimited by a semi-colon and not a comma. In this case you will probably need to define the structure using a schema.ini file.
    Paul ~~~~ Microsoft MVP (Visual Basic)
    Friday, December 12, 2008 6:11 AM
  •  Hello Paul,

    Thank you for your reply.
    The columns are in deed delimited by a semi-colon and not by a comma, but that's not the problem, since you can see by the table that I showed that the first line is parsed correctly.
    The problem is when there is a double-quote, then the remaining content is not read (and it should be).

    Best regards,

    Jorge
    Friday, December 12, 2008 10:26 AM
  • If using a schema.ini file does not resolve the problem then you will probably need to use standard file i/o to read the file in line by line and then split each line on the semi-colon delimiter to get your column values. I seem to recall that the Text driver (when using Jet OLEDB) always considers the double quote an end of line terminator if there isn't a preceding column delimiter.
    Paul ~~~~ Microsoft MVP (Visual Basic)
    Friday, December 12, 2008 3:09 PM
  • Hello Paul,

    You're right, a schema.ini solves it.
    The code for it is here.

    Thank you.

    Best regards,

    Jorge
    Friday, December 12, 2008 5:50 PM