none
Leading zeros not preserved when reading from CSV file, via OPENROWSET

    Question

  • When executing the following query

    SELECT *
      FROM
    OPENROWSET ('MSDASQL',
                'Driver={Microsoft Text Driver (*.txt; *.csv)};DBQ=C:\TEST\;IMEX=1',
                'SELECT "TAG ID" from test.csv')

    I am getting the following result:

    TAG ID
    ---------
    804356678
    805398176
    805398195
    803403083
    805398195
    806184445
    806184456
    805398168

    BUT, the data in the CSV file have leading zeros:

    TAG ID
    -----------
    00804356678
    00805398176
    00805398195
    00803403083
    00805398195
    00806184445
    00806184456
    00805398168

    How can I preserve the zeros ??

    Thanks in advance.

    Tom


    New to SQL Server
    Monday, February 14, 2011 9:07 PM

Answers

  • As I mentioned in my other message, try using BULK INSERT.

    I took your sample data and put it into a text file (called C:\Temp\XX.CSV) and the following script imported it with no problems:

    if object_id('tempdb..#CSVData','U') is not null drop table #CSVData
    create table #CSVData
    (
      PostDate datetime
     ,TranDate datetime
     ,TagID varchar(20)
     ,Agency varchar(20)
     ,Activity varchar(20)
     ,EntryTime varchar(20) --Can't do DATETIME because of hyphen in CSV
     ,EntryPlaza varchar(20)
     ,EntryLane varchar(20)
     ,ExitTime varchar(20)  --Make consistent with EntryTime
     ,ExitPlaza varchar(20)
     ,ExitLane varchar(20)
     ,VehClass int
     ,Amount varchar(20)  --Can't do DECIMAL/NUMERIC/MONEY because of Dollar Sign in CSV
     ,Prepaid char(1)
     ,PlanRate varchar(20)
     ,FareType char(1)
     ,Balance varchar(20) --Can't do DECIMAL/NUMERIC/MONEY because of Dollar Sign in CSV
    )
    
    bulk insert #CSVData
    from 'C:\Temp\xx.CSV'
    with (fieldterminator=','
       ,rowterminator='\n'
       ,firstrow=2)
    
    select * from #CSVData
    

    As you can see from the comment in the code above, though, some columns had to be defined as VARCHAR because the data in your CSV file is not in a format that would be consistent with the datatype.  For example, I would normally define ENTRYTIME as a DATETIME column... but your CSV data has a hyphen in that column, and that would not translate correctly.  Same with the AMOUNT and BALANCE columns... those have Dollar Signs in the CSV file, and that can't convert directly into a numeric datatype, so I had to make them VARCHAR columns.  But once they are in this table, you can convert from there.

     


    --Brad (My Blog)
    • Proposed as answer by Naomi NModerator Tuesday, February 15, 2011 5:29 PM
    • Marked as answer by TomS_ Wednesday, February 16, 2011 3:18 AM
    Tuesday, February 15, 2011 5:27 PM
    Moderator

All replies

  • I'm not sure if this will work, but try 2 approaches.

    1. Select cast("Tag ID" as varchar(20)) as [Tag ID] from test.csv

    If this will not work, I suggest

    2.

    create table Tags ([Tag ID] varchar(20))

    insert into Tags

    select * from Openrowset ...


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Monday, February 14, 2011 9:17 PM
    Moderator
  • I had same problem while importing data from excel through OPENROWSET.I changed the extented property to HDR = No ;IMEX =1 and it worked fine.

    Try to use this ,may be it works for CSV file as well!

    Monday, February 14, 2011 9:32 PM
  • Hi Naomi,

    I tried the 'cast' operator but it errors out:

    [Microsoft][ODBC Text Driver] Syntax error (missing operator) in query expression 'cast([TAG ID] as varchar(20))'.".

    Regards,

    Tom

     


    New to SQL Server
    Monday, February 14, 2011 9:49 PM
  • Hi NSMP,

    I added the extended properties HDR=0;IMEX=1, but nothing changed, still the leading zeros are not there.

    I believe we have to apply an operator within the select statement of the OPENROWSET command (select [TAG ID[ from test.csv).

    Do you know what are the acceptable operators/operands within the 'select' of OPENROWSET ??

    Best regards,

    Tom

     


    New to SQL Server
    Monday, February 14, 2011 9:54 PM
  • You need to write HDR=No instead of HDR =0.
    Monday, February 14, 2011 10:28 PM
  • Hi NSMP, no difference, I get the result WITHOUT the leading zeros.

    BTW, although I set HDR=NO, I am still getting the 'Header' !!!

    Again, I believe the solution/trick should be to pass an operand on the column while we read from the text.csv file.

    Best regards,

    Tom

     

     


    New to SQL Server
    Monday, February 14, 2011 10:32 PM
  • > Do you know what are the acceptable operators/operands within the 'select' of OPENROWSET ??

    They depend on the data source in question. In this case the operators available are those of the Microsoft Text Driver, which I am completely unfamiliar with. (This is a forum for Transact-SQL, not for the Text Driver.)


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online: SQL 2008, SQL 2005 and  SQL 2000.
    (Just click the link you need.)
    Monday, February 14, 2011 11:04 PM
  • Well,in that case i think you need to pad 0's !You will find so many posts if you want to do that!
    Monday, February 14, 2011 11:09 PM
  • Hi Erland,

    What are the methods, in T-SQL, to read data from a CSV file ?

    Best regards,

    Tom

     


    New to SQL Server
    Monday, February 14, 2011 11:22 PM
  • Take a look at this blog post 

    6 ways to import data into SQL Server

     


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Monday, February 14, 2011 11:23 PM
    Moderator
  • Hi NSMP, I have thought about padding with zeros (SELECT RIGHT('00' + CAST([TAG ID] AS VARCHAR(20), 20)), BUT there are cases where the 'original' data start with alphabetic characters (AB803451201, NY80393124, etc...)

    Best regards,

    Tom


    New to SQL Server
    Monday, February 14, 2011 11:25 PM
  • Not sure if this is what you want..But it works fine.

    create table #btest(output char(15))

    BULK INSERT #btest
       FROM 'C:\test.csv'
       WITH (
          DATAFILETYPE='char',FIRSTROW = 2
        );
    GO


    select * from #btest -- I get this..

    00804356678
    00805398176
    00805398195
    00803403083
    00805398195
    00806184445
    00806184456
    00805398168

     

    NJ

    Monday, February 14, 2011 11:53 PM
  • I have a feeling that HDR = NO should work!Can you post how exactly are you writing this?
    Tuesday, February 15, 2011 12:10 AM
  • CSV file are a big pain in the behind.

    The ONLY reliable way you're going to force the CSV driver to do what you want is to create a SCHEMA.INI file to describe the datatypes of the columns.  You can read more about that here:  http://msdn.microsoft.com/en-us/library/ms709353(VS.85).aspx

    You might want to use BULK INSERT instead of the OPENROWSET technique.

    You can read more about importing CSV information in a blog article I wrote last month:  http://bradsruminations.blogspot.com/2011/01/so-you-want-to-read-csv-files-huh.html

    Hope that helps...

    --Brad


    --Brad (My Blog)
    Tuesday, February 15, 2011 1:41 AM
    Moderator
  • Hi NJ,

    Here is what I get:

    BULK INSERT Tbl_Tag FROM 'c:\test\test.csv'
    WITH
    (
    FIRSTROW = 2,
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n'
    )

    Results:

    (0 row(s) affected)

     

     

    There are almost 800 records in the CSV file.

    Please note, If I run this, it works, but the leading zeros are gone !!!

    insert into dbo.Tbl_Tag
    SELECT *
      FROM
    OPENROWSET ('MSDASQL',
                'Driver={Microsoft Text Driver (*.txt; *.csv)};DBQ=C:\test\;HDR=No;IMEX=1',
                'SELECT * from test.csv')

    Results:

    (797 row(s) affected)


    New to SQL Server
    Tuesday, February 15, 2011 3:28 AM
  • I just tried the 'bcp' utility and it errors out as follows:

    bcp myDB.dbo.my_Table in c:\test\test.csv -c -t, -T -S SERVERNAME\NAMEDINSTANCE

    Starting copy...
    SQLState = 22008, NativeError = 0
    Error = [Microsoft][SQL Server Native Client 10.0]Invalid date format

    0 rows copied.
    Network packet size (bytes): 4096
    Clock Time (ms.) Total     : 110

    BTW, here is what my CSV file look like...

    POSTING DATE,TRANSACTION DATE,TAG ID,AGENCY,ACTIVITY,ENTRY TIME,ENTRY PLAZA,ENTRY LANE,EXIT TIME,EXIT PLAZA,EXIT LANE,VEHICLE CLASS,AMOUNT,PREPAID,PLAN/RATE,FARE TYPE,BALANCE
    02/12/2011,02/12/2011,00804356678,MTAB&T,TOLL,-,-,-,08:53:44,QMT,011,1,$4.80,Y,BUSINESS,N,$54614.14
    02/12/2011,02/12/2011,00805398176,MTAB&T,TOLL,-,-,-,08:02:39,QMT,012,1,$4.80,Y,BUSINESS,N,$54609.34
    02/12/2011,02/12/2011,00805398195,MTAB&T,TOLL,-,-,-,08:56:47,RKM,013,1,$4.80,Y,BUSINESS,N,$54596.04


    New to SQL Server
    Tuesday, February 15, 2011 1:30 PM
  • As I mentioned in my other message, try using BULK INSERT.

    I took your sample data and put it into a text file (called C:\Temp\XX.CSV) and the following script imported it with no problems:

    if object_id('tempdb..#CSVData','U') is not null drop table #CSVData
    create table #CSVData
    (
      PostDate datetime
     ,TranDate datetime
     ,TagID varchar(20)
     ,Agency varchar(20)
     ,Activity varchar(20)
     ,EntryTime varchar(20) --Can't do DATETIME because of hyphen in CSV
     ,EntryPlaza varchar(20)
     ,EntryLane varchar(20)
     ,ExitTime varchar(20)  --Make consistent with EntryTime
     ,ExitPlaza varchar(20)
     ,ExitLane varchar(20)
     ,VehClass int
     ,Amount varchar(20)  --Can't do DECIMAL/NUMERIC/MONEY because of Dollar Sign in CSV
     ,Prepaid char(1)
     ,PlanRate varchar(20)
     ,FareType char(1)
     ,Balance varchar(20) --Can't do DECIMAL/NUMERIC/MONEY because of Dollar Sign in CSV
    )
    
    bulk insert #CSVData
    from 'C:\Temp\xx.CSV'
    with (fieldterminator=','
       ,rowterminator='\n'
       ,firstrow=2)
    
    select * from #CSVData
    

    As you can see from the comment in the code above, though, some columns had to be defined as VARCHAR because the data in your CSV file is not in a format that would be consistent with the datatype.  For example, I would normally define ENTRYTIME as a DATETIME column... but your CSV data has a hyphen in that column, and that would not translate correctly.  Same with the AMOUNT and BALANCE columns... those have Dollar Signs in the CSV file, and that can't convert directly into a numeric datatype, so I had to make them VARCHAR columns.  But once they are in this table, you can convert from there.

     


    --Brad (My Blog)
    • Proposed as answer by Naomi NModerator Tuesday, February 15, 2011 5:29 PM
    • Marked as answer by TomS_ Wednesday, February 16, 2011 3:18 AM
    Tuesday, February 15, 2011 5:27 PM
    Moderator
  • Hi Brad, excellent!!!

    Yes, BULK INSERT is the way to do it, and not OPENROWSET.

    I had to make use of the formatfile operand, here is the latest structure:

    BULK INSERT TBL_TAG
    FROM 'C:\test\test.csv'
    WITH ( formatfile = 'C:\myTestFormat.fmt', FIRSTROW = 2
    )

    THANK YOU ALL!!!

     


    New to SQL Server
    Wednesday, February 16, 2011 3:21 AM