locked
Importing CSV file problem. RRS feed

  • Question

  • Hello all,

    This is a rather simple procedure that I think has got a bit out of hand. We get some files from a supplier dumped onto an FTP every morning. I was given a task to pull the files off the FTP, into a directory, then import each file from the directory into a table with a flag for the source of each file.

    I wrote a batch script for pulling the files off the FTP. It then gets the directory list and iterates through a loop inserting each file into a respective table, then the tables get pushed into one big table with a s1 field at the end telling me which file the specific row has come from.

    DATA DEFINITION:

    CREATE TABLE DBO.FILESOURCE1 (
    TITLE VARCHAR(5)
    ,FORENAME VARCHAR(50)
    ,SURNAME VARCHAR(50)
    ,AD1 NVARCHAR(150)
    ,AD2 NVARCHAR(150)
    ,AD3 NVARCHAR(150)
    ,AD4 NVARCHAR(150)
    ,AD5 NVARCHAR(150)
    ,POSTCODE NVARCHAR(10)
    ,PHONE1 NVARCHAR(20)
    ,PHONE2 NVARCHAR(20)
    ,MOBILE NVARCHAR(20)
    ,EMAIL NVARCHAR(150)
    ,SOURCE NVARCHAR(50)
    )

    PROBLEM: 

    FILESOURCE1 is just a representation of one file of what could be upto 6 files that come in each morning, each with a different source. My problem is that sometimes a file may come in and it has Email missing, or Mobile, effectively changing the format of the file and creating a conversion error when the process tries to import the data into it's respective table. I'm using a standard BULK INSERT statement to insert the data so if anything doesn't match up then it will complain.

    I have had a thought about how to solve this, and that's the read the first line of the csv (the column names), and parse it to create a dynamic select statement (In before Celko tells me off), then use the select statement to insert with OPENROWSET. Is that a possibility?

    Basically the jist of it is I have a table to insert records from a file, I need it automating, but the file comes in with differing formats each morning so I need to account for this. Any possible ways to do this?

    Wednesday, January 2, 2013 12:53 PM

Answers

  • The best solution, of course, is to speak to whoever supplies the files and come to an agreement on getting consistent files delivered.  In practice this isn't always easy or even possible, but it should always be the first avenue to take.

    Personally, if the CSV files aren't too large I would write a simple powershell script to parse the files and add in any missing fields.  Then you can run your import process without issues.  However for larger files running into the hundreds of MBs and above this solution won't run too quickly.  At that point you could think about auto-generating a format file and using bcp to import the data. 

    Wednesday, January 2, 2013 2:51 PM

All replies

  • Is it possible?  Yes.  The problem is that TSQL isn't the appropriate tool for this.  In addition, I'll guess that you either do need (or will eventually) much more intelligent ways to interrogate the source files and determine what to do.  Error handling capabilities at the tsql level are inconsistent at best. 

    I suggest you simply review all of the thousands of posts on this topic to get a sampling of how others have approached this problem.  Differing formats are an enormous problem and simple scripting will not provide a robust; it will be impossible in tsql alone.  Below is a link that more fully describes error handling in tsql, in case you are interested.  Lastly ALWAYS indicate which version of sql server you are using - no sense in discussing features that you cannot use.

    Error handling - Erland

    Wednesday, January 2, 2013 2:12 PM
  • The best solution, of course, is to speak to whoever supplies the files and come to an agreement on getting consistent files delivered.  In practice this isn't always easy or even possible, but it should always be the first avenue to take.

    Personally, if the CSV files aren't too large I would write a simple powershell script to parse the files and add in any missing fields.  Then you can run your import process without issues.  However for larger files running into the hundreds of MBs and above this solution won't run too quickly.  At that point you could think about auto-generating a format file and using bcp to import the data. 

    Wednesday, January 2, 2013 2:51 PM
  • Thanks for the replies. I actually contacted the supplier and ensured that files are received in a consistent format, which allowed me to complete the automated process. No more moving files manually from FTPs or running through long SQL files ensuring the data is formatted properly!

    Just so you know, the table has keys and constraints built on it further on down the line.

    Thursday, January 3, 2013 1:38 PM