none
Using Oledb to read a .csv with double quotes in it RRS feed

  • Question

  • Hello all!

    I'm trying to read from a .csv file that has the following content:

    field 1;field 2;field 3;field 4;field 5
    field 1;field "2";field 3;field 4;field 5
    field 1;field 2;field 3;field 4;field 5

    Everything goes well except for the second line where the connection is unnable to continue beyond the first double quote, so my table ends up like this:

    +---------+---------+---------+---------+---------+
    | field 1 | field 2 | field 3 | field 4 | field 5 |
    +---------+---------+---------+---------+---------+
    | field 1 | field   |         |         |         |
    +---------+---------+---------+---------+---------+
    | field 1 | field 2 | field 3 | field 4 | field 5 |
    +---------+---------+---------+---------+---------+


    The code is the following:

    OleDbConnection connection = new OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0; Data Source = " + folder + "; Extended Properties = \"Text;HDR=YES;FMT=CSVDelimited\"");

    connection.Open();

    OleDbDataAdapter
    adapter = new OleDbDataAdapter("SELECT * FROM " + file, connection);
    DataSet
    data_set = new DataSet("Temp");
    adapter.Fill(data_set);
    DataTable data_table = data_set.Tables[0];


    Is there any Extended Property that allows us to tell the connection to ignore the double quotes or to treat them as simple text.

    Thank you for your time.

    Best regards,

    Jorge

    Thursday, December 11, 2008 7:02 PM

Answers

  • Unfortunately, you're going to have to create a schema.ini to pull from the file via OLE.  To do this, you create a schema.ini file, and place it in the same directory as the files you're pulling the data from.  In it, you're going to have to specify the "TextDelimiter" and the "Format" to which the file complies.  The following code should replace the code you've written above, and should work for the purposes you're trying to accomplish, as long as single quotes aren't used anywhere in the text file.

    using (FileStream stream = File.Open(Path.Combine(folder, "schema.ini"), FileMode.Create, FileAccess.Write, FileShare.None))  
    {  
        using (StreamWriter writer = new StreamWriter(stream))  
        {  
            writer.WriteLine(string.Format("[{0}]", file));  
            writer.WriteLine("Format=Delimited(;)");  
            writer.WriteLine("TextDelimiter='");  
            writer.WriteLine("ColNameHeader=False");  
        }  
    }  
     
    OleDbConnection connection = new OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0; Data Source = " + folder + "; Extended Properties = \"Text;\"");  
     
    connection.Open();  
     
    OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM " + file, connection);  
    DataSet data_set = new DataSet("Temp");  
    adapter.Fill(data_set);  
    DataTable data_table = data_set.Tables[0]; 

    David Morton - http://blog.davemorton.net/
    • Marked as answer by Jorge Raimundo Friday, December 12, 2008 5:47 PM
    Thursday, December 11, 2008 10:29 PM
    Moderator

All replies

  • Unfortunately, you're going to have to create a schema.ini to pull from the file via OLE.  To do this, you create a schema.ini file, and place it in the same directory as the files you're pulling the data from.  In it, you're going to have to specify the "TextDelimiter" and the "Format" to which the file complies.  The following code should replace the code you've written above, and should work for the purposes you're trying to accomplish, as long as single quotes aren't used anywhere in the text file.

    using (FileStream stream = File.Open(Path.Combine(folder, "schema.ini"), FileMode.Create, FileAccess.Write, FileShare.None))  
    {  
        using (StreamWriter writer = new StreamWriter(stream))  
        {  
            writer.WriteLine(string.Format("[{0}]", file));  
            writer.WriteLine("Format=Delimited(;)");  
            writer.WriteLine("TextDelimiter='");  
            writer.WriteLine("ColNameHeader=False");  
        }  
    }  
     
    OleDbConnection connection = new OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0; Data Source = " + folder + "; Extended Properties = \"Text;\"");  
     
    connection.Open();  
     
    OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM " + file, connection);  
    DataSet data_set = new DataSet("Temp");  
    adapter.Fill(data_set);  
    DataTable data_table = data_set.Tables[0]; 

    David Morton - http://blog.davemorton.net/
    • Marked as answer by Jorge Raimundo Friday, December 12, 2008 5:47 PM
    Thursday, December 11, 2008 10:29 PM
    Moderator
  • Hi Jorge Raimundo:

    Try this:

    1. Use a schema.ini File... U can find more information about schema.ini file in

    http://support.microsoft.com/kb/149090


    2. Use  IMEX y MAXSCANROWS Extended properties. For more information I can vist this Web (in spanish)

    http://www.mvp-access.es/softjaen/articulos/excel/ado_dao_excel.htm#Index3

    However I have a problem with boot properties when I try to read an Excel File similar at your File

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;189897

    You can modify this registry key to determine how many rows to scan in your data:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text\MAXSCANROWS

    Take a look  to this too:

    http://www.connectionstrings.com/textfile


    Saludos
    Jack


    Thursday, December 11, 2008 10:30 PM
  • Hello Ronald!

    Thank you for your answer but removing the double-quotes from the file is not an option. The content must be exactly as it is in the original file.
    Also it is not necessary to change de semi-colons to commas since the code I posted is perfectly able to read the csv file delimited by semi-colons.

    The problem is the it considers the double-quote as a special character of some kind when it shouldn't.
    How can this be changed?

    Best regards,

    Jorge
    Friday, December 12, 2008 10:35 AM
  • Hello David,

    Thank you for your reply.
    However, allow me to point out that I do not need to specify the delimiter since the code I showed is able to import the csv with the semi-colon as a delimiter.
    My problem is that when the connection encounters a double-quote, it stops reading the remaining content on that line.
    I'm not sure if that is solved by using the schema.ini file you're proposing.
    Are there any other options regarding the occurrence of double-quotes on the text file?

    Best regards,

    Jorge
    Friday, December 12, 2008 10:40 AM
  • Hello Jack!

    Thank you for your reply.

    I don't think the schema.ini file will solve this as mentioned on my other answers, unless there are some options regarding the occurrence of double quotes on the text file.

    I tried to use the Extended properties you mentioned, but I had no success. Wherever there is a line containing a double-quote it is always truncated.

    Changing the registry is also not an options, since this is a program that must run in any PC without needing any configuration.

    Best regards,

    Jorge
    Friday, December 12, 2008 10:54 AM
  • Jorge Raimundo said:

    Hello David,

    Thank you for your reply.
    However, allow me to point out that I do not need to specify the delimiter since the code I showed is able to import the csv with the semi-colon as a delimiter.
    My problem is that when the connection encounters a double-quote, it stops reading the remaining content on that line.
    I'm not sure if that is solved by using the schema.ini file you're proposing.
    Are there any other options regarding the occurrence of double-quotes on the text file?

    Best regards,

    Jorge


    Jorge.  The problem is solved with the schema.ini I'm proposing.  Please try the code example I posted above. I believe you'll find it works perfectly.

    David Morton - http://blog.davemorton.net/
    Friday, December 12, 2008 12:25 PM
    Moderator
  • Hello David,

    You were right.
    It worked.

    Thank you very much...

    Best regards,

    Jorge
    Friday, December 12, 2008 5:47 PM
  •  

    Hi David,

     

    I had the same problem but i notice that my syntax of the connection string was not correct until i saw your example.

    So, i changed it from:

    Extended Properties=Text
    to
    Extended Properties=\"Text;\"
    and that did work.

     

    Thank you,


    Michiel

    • Proposed as answer by Alex Resnik Wednesday, December 14, 2011 4:16 PM
    Thursday, May 6, 2010 1:00 PM
  • TextDelimiter=none

    • Proposed as answer by vikrantislav Monday, June 25, 2012 5:01 PM
    Wednesday, December 14, 2011 4:21 PM
  • TextDelimiter=none

    this solved the problem, thanks
    Monday, June 18, 2012 3:21 PM