none
Problem reading RFC4180 compliant CSV file using JET driver. Cant handle apostrophes between double quotes. RRS feed

  • Question

  • The RFC4180 standard can be found here: http://www.rfc-editor.org/rfc/rfc4180.txt

    The input CSV file I am trying to process has all it's fields wrapped in double quotes but these fields can sometimes contain carrige return/line feeds. (this was the reason for using the RFC standard which allows this).

    I researched how to read a file of this type and came accross a suggestion to use the Jet driver. Below is the code I do this with:

    /// <summary>
    /// Read the CSV data from the file into a datatable
    /// </summary>
    /// <remarks>
    /// This method is compliant with the RFC standard of CSV files 
    /// (http://www.rfc-editor.org/rfc/rfc4180.txt)
    /// </remarks>
    /// <param name="rawData">The empty datatable to be populated with the data from the file</param>
    /// <param name="filePath">The path to the csv file</param>
    private void FillRawDataTable(DataTable rawData, string filePath)
    {
        try
        {
            string full = Path.GetFullPath(filePath);
            string file = Path.GetFileName(full);
            string dir = Path.GetDirectoryName(full);
    
            //create the "database" connection string which will actually be the file
            string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
              + "Data Source=\"" + dir + "\\\";"
              + "Extended Properties=\"text;HDR=No;FMT=Delimited\"";
    
            //create the database query
            string query = "SELECT * FROM " + file;
    
            // Create the connection
            using (OleDbDataAdapter dAdapter = new OleDbDataAdapter(query, connString))
            {
                //fill the DataTable
                dAdapter.Fill(rawData);
            }
        }
        catch (Exception ex)
        {
            throw new MHRException("Error detected reading the CSV file using OleDB Adapter", ex);
        }
    }


    My problem is that if I use the line below as an example
    "DCI","Fred","O'Brian","999 Lets Be Avenue"
    I will get an exception generated when I fill the rawData table.
    However if I escape the apostrophe in the file with this line:
    "DCI","Fred","O\'Brian","999 Lets Be Avenue"
    it all works.

    Should the Microsoft JET driver handle apostrophes without having to escape them, especially when reading in a string between double quotes?

    So why not just use the escape characters in the file I hear you all scream... well, the file being sent by the customer can't be modified before sending.

    I have the option of opening the file, reading it into memory in blocks, do a character replace on the apostrophes, before re-writing it back out. I would then keep my existing code in place but it doesn't seem very elegant and doesn't address the problem with the driver.

    Can anyone suggest a better method?


    There are only 10 types of people. Those who understand binary and those who don't
    Wednesday, May 13, 2009 2:54 PM

All replies

  • I can't repro your issue. As a matter of fact if I copy your example line into a text file and read it with Jet/Text ISAM it parses just fine. I'm kind of curious whether this has something to do with C# since I'm using VB to test this.

    What is the actual exception message (ex.Message) that is being generated when you run your code?
    Paul ~~~~ Microsoft MVP (Visual Basic)
    Wednesday, May 13, 2009 5:31 PM