none
Importing CSV file to SQL Server, Code Page and Data Type Issues

    Question

  • I have a series of CSV files that I need to import daily into a SQL server database.  I am having a particular issue with a field: LoanNumber.  LoanNumber can be either a very long number or text, my issue is the data needs to eventually get into a varchar field and depending on how I try to move it through SSIS, either the text gets Nulled or some of the numbers end up as text in scientific notation.

    The origin of these files are CSV files I download.  The resulting file is a CSV that uses Code Page 65001 (UTF-8) where some LoanNumber's are text, some numeric and some in scientific notation, the Text is Qualified by "" and comma delimiter is a ','.  My goal is to get this field into a SQL Server table as a varchar, which requires Code Page 1252 as part of the transformation.

    Part of my issue is the different ways Excel handles the CSV file depending on how it is opened.  If I use Windows Explorer and double click on the file name, it will open in Excel with no wizard or prompting, I can widen and change the LoanNumber column to a "Number" type which converts all the scientific notation to numbers and save the file.  What is odd though, is even after saving the file using the save button, and without making additional changes, when I close the file Excel prompt's me to save again.  If I save with this prompt, Excel saves the CSV as an ANSI 1252 file that is tab delimited (according to SSIS) even though it is still a CSV file.

    On the other hand, if I have Excel open and try to open the CSV file, it gives me the text conversion wizard.

    My question is this: is there a way in Excel to force a CSV file to be saved in the ANSI 1252, tab delimited format, or is there some other way to save the file that is easy for SSIS to interpret?  I have been converting all of my CSV files to Excel files, and using SSIS with Excel data sources, but Excel will implicitly convert some of my numbers to Scientific Notation at times, even if the column is a Number column I've found, so I don't think Excel data sources for SSIS are the right answer.

    Tuesday, August 07, 2012 5:26 PM

Answers

  • Thanks Arthur, I'll try today just using the CSV files without ever opening them in Excel.  Assuming that Excel is the scientific notation culprit (which makes sense, thanks!), I think the only issue I'll have to conquer is moving the data from Code Page 65001 (which is what a native CSV file comes to me as) to ANSI 1252, and deciding where in the process to do so.

    Thanks again for the help.

    • Proposed as answer by Eileen Zhao Tuesday, August 14, 2012 9:26 AM
    • Marked as answer by Eileen Zhao Wednesday, August 15, 2012 8:31 AM
    Wednesday, August 08, 2012 12:18 PM

All replies

  • here is something you can try:

    http://publib.boulder.ibm.com/infocenter/ts7700/cust/index.jsp?topic=%2Fcom.ibm.storage.ts7740.doc%2Fts7740_ua_preserve_leading_zeros.html


    Please mark the post as answered if it answers your question


    Tuesday, August 07, 2012 5:48 PM
  • Excel is not unicode aware, and its driver.

    You just need some data cleansing and 1st of all try to set the connection property to it with IMEX=1 to instruct it to work for import.


    Arthur My Blog

    Tuesday, August 07, 2012 7:25 PM
  • DotNetMonster, thanks for the link; I am still validating it will wok, but so far so good as at least a potential solution.

    ArthurZ, I appreciate the response, but unclean data isn't my problem: 2005983478650234 and FirstLastName041512 are both potentially valid Loan Numbers in my data.  I don't understand the connection property comment; I'll do my homework on IMEX=1 and see if I can figure it out.

    Tuesday, August 07, 2012 7:47 PM
  • You need to persist the metadata, the "scientific notation" you get is because the text gets interpreted as a number, just set this field to string.

    Furthermore, if you get a CSV files why would you use Excel? Just import the CSV, they are text files easier to deal with than Excel Matt.


    Arthur My Blog

    Tuesday, August 07, 2012 11:46 PM
  • Thanks Arthur, I'll try today just using the CSV files without ever opening them in Excel.  Assuming that Excel is the scientific notation culprit (which makes sense, thanks!), I think the only issue I'll have to conquer is moving the data from Code Page 65001 (which is what a native CSV file comes to me as) to ANSI 1252, and deciding where in the process to do so.

    Thanks again for the help.

    • Proposed as answer by Eileen Zhao Tuesday, August 14, 2012 9:26 AM
    • Marked as answer by Eileen Zhao Wednesday, August 15, 2012 8:31 AM
    Wednesday, August 08, 2012 12:18 PM