none
bulk insert quote (") qualified fields RRS feed

  • Question

  • Let me first describe what I mean by "quote qualified" fields. Let's say I have a comma delimited file where in between each comma I have a value that is wrapped around a double quote ("). So, let's say the file has 3 columns Name, Address, Phone and you have 2 rows that looks like this.

    "John Smith","5 Broadway New York NY 10001", "212-555-1212"

    "Paker,Sara","6 Main St White Plains, NY 10586", "914-222-0000"

    Notice that the second row has a comma appearing as a part of the actual column data for Name and Address. But since the field are quote (") qualified, those commas won't be seen as a field delimiter.

    So, that's the kind of file I have and I'm trying to load these into a table. I can easily do this with an SSIS data flow because SSIS implements the concept of a text qualifier. But I want to do this using Bulk Insert, but the Bulk Insert doesn't have a qualifier as one of the options. And without that, obviously the data in the second row gets messed up because the extra commas will shift the columns.

    The reason why I want to do this with a Bulk Insert command is becaus I want to have one generic stored proc that takes 2 basic parameters, a file name and a table name and matches the file to the table and just loads. I can't do this with on generic SSIS data flow because a data flow is fixed on the number of columns. So I would have to have a data flow for each table I want to load. I don't want to do that for all the different files we have. So, having a parameterized Bulk Insert would be ideal for me

    So one way I've thought about doing this is doing some file processing where I clean up the files so that the above two rows end up looking something like this.

    John Smithæ5 Broadway New York NY 10001æ212-555-1212

    Paker,Saraæ6 Main St White Plains, NY 10586æ914-222-0000

    Basically, the clean up I did above is

    (1) Get rid of the first and the last quote. and

    (2) replace "," with some "weird" character like æ (alt-145). The point of the "wierd" character is to minimize the chance that the "weird" character actually happens in the real data.

    Once the file is cleaned up this way, I can use Bulk Insert where I use æ as the field delimiter so that the commas can be taken in as part of the column data.

    Does this sound like a typical approach? Any suggestions on how I can take in the original file and use Bulk Insert without the clean up?


    Tuesday, March 5, 2013 7:50 PM

Answers

  • My gut reaction is to try the last method, bulk insert into a one column table and do things to parse it. I looked into format files, but you still have to jump trhough hoops to get deal with the first and the last qualifier value. Not sure about SSIS import/export wizard. I'm trying to avoid SSIS on this particular task.

    Tuesday, March 5, 2013 8:33 PM
  • There is not a good way to do this with BULK INSERT or BCP.   The simplest way to import this type of file is to use SSIS, which will allow you to specify the text qualifier as ".

    If you must do it in BULK INSERT/BCP, I always import it as comma delimited and then run an update to remove the leading/trailing double-quotes.

    If you have the ability to change the source, the better option is to have the source changed to TAB delimited, without a text delimiter.  It is unlikely you will have TABs in your text unless you have free-form type fields.

    Tuesday, March 5, 2013 8:37 PM
    Moderator

All replies

  • Have you tried FORMAT file?

    http://jessesql.blogspot.com/2010/05/bulk-insert-csv-with-text-qualifiers.html

    Have you tried the SSIS Import/Export Wizard?

    http://www.sqlusa.com/bestpractices/ssis-wizard/

    You can also upload it to staging table with BULK INSERT as one line and parse it with T-SQL stored procedure then move it to final destination:

    http://www.sqlusa.com/bestpractices2005/notepad/


    Kalman Toth Database & OLAP Architect
    Paperback / Kindle: Windows Azure SQL Database Programming & Design




    Tuesday, March 5, 2013 8:06 PM
    Moderator
  • My gut reaction is to try the last method, bulk insert into a one column table and do things to parse it. I looked into format files, but you still have to jump trhough hoops to get deal with the first and the last qualifier value. Not sure about SSIS import/export wizard. I'm trying to avoid SSIS on this particular task.

    Tuesday, March 5, 2013 8:33 PM
  • There is not a good way to do this with BULK INSERT or BCP.   The simplest way to import this type of file is to use SSIS, which will allow you to specify the text qualifier as ".

    If you must do it in BULK INSERT/BCP, I always import it as comma delimited and then run an update to remove the leading/trailing double-quotes.

    If you have the ability to change the source, the better option is to have the source changed to TAB delimited, without a text delimiter.  It is unlikely you will have TABs in your text unless you have free-form type fields.

    Tuesday, March 5, 2013 8:37 PM
    Moderator
  • There is not a good way to do this with BULK INSERT or BCP.  


    Thanks for verifying that.
    Tuesday, March 5, 2013 9:34 PM
  • Yet a possibility is to write a CLR stored procedure. There is a class in the VisualBasic name space to read CSV file you could use.

    Format files are not too difficult, once you have learnt the tricks. For the example file you posted the format file would read:

    9.0
    4
    1 SQLCHAR 0 0 "\""     0 ""   ""
    2 SQLCHAR 0 0 "\",\""  1 col1 Latin_General_CI_AS
    3 SQLCHAR 0 0 "\",\""  2 col2 Latin_General_CI_AS
    4 SQLCHAR 0 0 "\"\r\n" 3 col3 Latin_General_CI_AS

    There are three fields, but according to the format file there are four fields; it adds a zero-space field before the initial quote. This field is not imported to the database.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Tuesday, March 5, 2013 11:07 PM