none
Flat File and uneven number of columns

    Question

  • Please leave feedback for Microsoft regarding this problem at https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=126493

     

    Ok I'm sure it just me, SSIS has been great so far....but how can you import a straight CSV file with and uneven column count.

    For example:  (assume CR LF row delimiter)

    The,Quick,Brown,Fox,Jumps
    Hello,World
    This,is,a,test

    "Normally" I'd expect this

    | Col1 | | Col2 | | Col3 | | Col4 | | Col5 |
    The Quick Brown Fox Jumps
    Hello World NULL NULL NULL
    This is a test NULL

    Ok but what we get is the row delimiter is ignored in preference for the column delimiter and the row delimiter gets sucked into the column and the next row starts to get layed down. 

    So we get

    | Col1 | | Col2 | | Col3 | | Col4 | | Col5 |
    The Quick Brown Fox Jumps
    Hello World{CR}{LF}This is a test

    I'm I not seeing a tick box somewhere that says "over here if you want to terminate a row on the row delimiter even if all columns aren't full and we'll pad NULLs in rest of the row columns which you can fix in the flow transformations"

    I'm sure it's there.....help!

    (By the way SSIS team, great job on the package love using it)

     

    Friday, January 20, 2006 8:14 AM

All replies

  • Try using ragged right instead of delimited.

    Thanks,

    Matt

    Monday, January 23, 2006 4:36 PM
  • Ah see if only it was that simple Matt.  "Ragged right files are files in which every column has a fixed width, except for the last column, which is delimited by the row delimiter."

    So the only thing which can be Ragged in fact is the last column.  In my example I've purposely put varying column widths so the old "Ragged Right" doesn’t apply (this is more the norm for ragged files). 

    So any ideas how this is going to work in SSIS...I must be missing something simple here this is a very common problem that the old DTS, Excel, Access, and other programs deal with quite well.

    help!

    Monday, January 23, 2006 10:57 PM
  • Unfortunately we only support ragged right for fixed width so for this scenario you would have to read it as a single column and then split it using a script component or you could write a custom component that split it.

    Matt

    Tuesday, January 24, 2006 8:52 PM
  • but this is a bug or changed behavior from the old sql server 2000 dts which when you had a comma or tab deliminited file some of the rows had extra columns after the last valid one.. it ignored those extra columns.

     

    the new functionality is to instead append the commas/delimiters into the data pulled in from the last column..

     

    This is causing us major grief becuase now none of our dts's work in the new sql server 2005 so it is not backwards compatible..

     

    Friday, March 10, 2006 4:01 PM
  • This is similar to an issure raised the other day. I produced a sample package on how to handle this.

    http://sqlblogcasts.com/files/4/integration_services/entry412.aspx

    Friday, March 10, 2006 4:08 PM
  • I'm with you Igkahn.  It's a bug.  No way is this a feature!  This is killing us too.  If I have to script this as set out below then something fundimentally is wrong with SSIS.  I cut one of our large SQL 2000 servers over to SQL 2005 and we are stopping there until fixes are supplied for errors like this.  I'm now having to use the SQL 2000 DTS instead of the nice looking however functionality poor SSIS.

    Please get this fixed.

    Garry Swan
    Information Systems Manager
    CSIRO
    Australia

    Monday, March 13, 2006 8:24 AM
  • If you feel strongly about it then you should raise it through the Feedback centre and through your MS representative if you have one.

     

    Monday, March 13, 2006 8:47 AM
  • Calling it a bug implies that is supposed to do something else and I'm not sure that is the case. I'm not denying that it MIGHT be a bug - hopefully Matt will reply again and tell us (By the way, just because some behaviour was present in DTS, you shouldn't assume the same would be true of SSIS. This is a replacement, not an upgrade).

    Regardless, I can understand why this is causing headache. Have you logged it at the feedback center (http://lab.msdn.microsoft.com/productfeedback/default.aspx)? If not, then you shouldn't expect that the feature will make it into the next version.

    I don't understand why you call SSIS functionally poor. As Simon explained this can still be achieved quite easily. Surely the fact that there is "more than one way to skin a cat" so to speak means the product is functionally rich as opposed to poor? Is the fact that something cannot be achieved using your "preferred" method really prohibiting you from moving to a superior product?

    -Jamie

    Monday, March 13, 2006 8:52 AM
  • From my research there has been a lot of posts about this issue in the last couple of months.   But im still looking for a proper solution,  is there a fix available?  Is there a code sample available for a SSIS source component or whatnot, that threats uneven columns properly?

    I've looked through the sample for the source component and could not get it to run (something about no compatible component in the dll)

    Since people do need to import data from delimited files very often,  SSIS should be expending the functionality of what there was in the DTS world... but in this case it was decided to limit it.  Whoever came up with the delimiter for each column must have been so proud of himself that he generlized and removed the need for a row delimiter....

    Well you haven't, and we want it back. :) It's causing many people, a lot of greef.

    Or at least release the source for a flat file handler which people can customize to their liking.  

    There is so many flat file formats out there....  Why be so restrictive when dealing with them???

     

     

    Tuesday, March 21, 2006 11:12 PM
  • Whats wrong with the sample I have provided.

    SSIS focus was on building a framework that was performant, scalable and extensible. For this reason the components out of the box don't satisfy everyones requirements. But I have shown how it is very easy to extend SSIS with a script component to achieve your goal.

    If you want a more polished solution you can develop your own custom component but that is another level of complexity that really isn't needed due to the power of the script component.

    Wednesday, March 22, 2006 9:48 AM
  • Someone mentioned that the code provided by Sabin didn't work for him/her. I have not tried that code.

    Here is a version (another way to skin the cat) that can be tried. Please note that your first row should contain all the column names for this to work. This script assumes that the input is configured in a way that all data on one row constitute a column. So basically, there is just one column in each row. The output of this script is also one column in each row. However, the output created by this script can be "understood" by SSIS file connection manager. Finally, assumption is that "tab" is column delimiter. You can change that below, if that is not the case.

     

    Private dataRow As Boolean = False

    Public Function AppendMissingTabs(ByVal Row As InputBuffer) As String

            Dim columns As String() = Nothing
            Dim outputRow As String = Nothing
            Dim outputString As String = Nothing
            Dim buffer As StringBuilder = New StringBuilder()

            buffer.Append(Row.BigSingleColumn)

            columns = Row.BigSingleColumn.Split(New [Char]() {Chr(9)})

            If Not dataRow Then
                columnCount = columns.Length
            End If

            Dim N As Integer
            If columns.Length < columnCount Then

                For N = 1 To columnCount - columns.Length Step 1
                    buffer.Append(Chr(9))
                Next N

            End If

            dataRow = True

            Return buffer.ToString()

        End Function

     

    Wednesday, March 22, 2006 3:10 PM
  • It comes down to if I tell a system that my row finishes with a crlf I don't want it to override my decision and pull the crlf into the data.  Why would I want the row delimiter in the data?!?? Is this a feature people needed and the SSIS team decided the former was so last week that this new functionality was, in fact, a better solution path...I just doubt it.  The required functionality is the way Excel, DTS, Access and many other apps handle this data import.  Why is SSIS pulling crlf into my data when I've said this is the end of the row??!?.  Just stop and move onto the next row.  Put that functionality back in and you've got a great delimited file importer. 

     

    Garry Swan

    MCDBA

    Thursday, April 06, 2006 8:05 AM
  • What you have actually said is that your column finishes with your delimiter. The flat file source then looks for that delimiter to end the column. The CRLF is not defined as the record delimiter but the delimiter of the last column.

    I know that doesn't solve the problem but should explain the reason for the situation.

    I still stand by the fact that if you want this functionality using the script component is a valid solution.

    Thursday, April 06, 2006 9:23 AM
  • Last night I uploaded a custom source adapter to SourceForge that would let you parse this file with regular expressions.

    http://sourceforge.net/projects/textregexsource

    Just connect a file connection manager to it and set a regular expression, and it could produce what you want.

    I think a regex like (?'Col1'\w+),*(?'Col2'\w*),*(?'Col3'\w*),*(?'Col4'\w*),*(?'Col5'\w*)\n

    might do the trick, though it might need tweaking.

    Geof

     

    Thursday, April 06, 2006 3:45 PM
  • I posted about this a couple of weeks ago, and got no answer, so I am glad it is not just me.

    In my view, this is a bug, as the row delimiter is ignored. It's not how it worked in  SQL 2000, and it's not how it works in any version of Access. A row delimeter is a 'row delimeter', not a 'row delimeter if it happens to appear' in the last column.

    It's all very well having work arounds, and other way of doing it, but it is putting another layer of processing in that you never needed to use before with all the issues that brings.

    Still, it's not the worst bug we've found in moving to 2005!

    SteveG

    Thursday, April 06, 2006 4:12 PM
  • Just checked the SP1 version of SQL2005 and it still has the same problem.  Come on guys get this fixed.
    Tuesday, July 18, 2006 8:18 AM
  •  CSIROSwan wrote:
    Just checked the SP1 version of SQL2005 and it still has the same problem.  Come on guys get this fixed.

    Its not a bug in my opinion. If you tell SSIS to expect a file in a certain format and it doesn't arrive in that format then what it should it do? Do you honestly think it should just guess like DTS would have done? What if it guesses wrong (as DTS was wont to do)?

    There are very easy ways to deal with this problem. I posted something just last week explaining how one might do it and there's an executable demo there as well: http://blogs.conchango.com/jamiethomson/archive/2006/07/14/4226.aspx

    Hope that helps!

    -Jamie

     

    Tuesday, July 18, 2006 8:25 AM
  • Look trust me it's a bug.  The new intergration system is really bad at importing data without lots of help.  We are holding back one of our SQL2000 servers just so we can continue to use DTS (and that's saying something about SSIS).  Really how hard can it be.  Get the "normal" expected functionality back into these processes.  Listen to your customer we are telling you this is not what we want!

     

     

    Wednesday, November 22, 2006 6:49 AM
  • our solution was for me to write a c program that we pass flat files through first that fixes the number of columns

    you tell it the number of columns and the deliminator and the string terminator if any and it will either remove or add columns to each row.

     

    this is something in my opinion that should be built into sql server ssis

    ie

    D:\workarea\root>fixinput.exe
    FixInput (c) 2006 larry kahn - icentrix.com (kahn@lgk.com)

    syntax: inputfile outputfile columns terminator delimiter debug

    Note to pass a quote or command put it in quotes ie: fixinput infilename.csv out filename.csv 10 "," """ 1
         to pass a tab character as the terminator pass T ie: fixinput infilename.txt outfilename.txt "T" "" 1

    D:\workarea\root>

     

    http://stage1.icentrix.com/fixinput.exe

     

    hope this helps


     

     

    Wednesday, November 22, 2006 2:12 PM
  • See the thing is we can all write code that will work around this problem; however, we need to ask why we are writing code for such a simple import.  There are 100's of solutions here however the only solution we should require for this very basic kind of data import is the SSIS package to recognise the line terminator and the column separators as every other package from Excel to DTS has done in the past.

     

    This is a bug and someone in the SSIS team needs to take this seriously and get it fixed.

    Wednesday, November 22, 2006 11:52 PM
  •   > What you have actually said is that your column finishes with your delimiter.

      > The flat file source then looks for that delimiter to end the column.

      > The CRLF is not defined as the record delimiter but the delimiter of the last column.

    Programmatically, I can see why this might have been easier to code...but, simply put, this is very non-intuitive behavior.  People expect a record delimiter to take precedence over a column delimiter.  Yes, it's self-consistent, but that doesn't mean it's "right".

     And saying that there's a scripting workaround is a bit of a cop out.  Don't get me wrong...I appreciate the workaround!  But  it seems like you're saying that because there's a workaround, the inconvenient and unexpected behavior is fine.  Seems odd. 

    Wednesday, November 29, 2006 5:00 PM
  • Yup, definitely a BUG.  We are not going to purchase SQL 2005 for our production servers until the issue is fixed.  We have over 40 flat file imports and we will not spend the money trying to program a solution to this bug for each import.   Just glad we purchased SQL 2005 Dev and tried recreating our DTS packs before we actually purchased SQL 2005.
    Thursday, December 07, 2006 11:05 AM
  •  Zmope wrote:
    Yup, definitely a BUG.  We are not going to purchase SQL 2005 for our production servers until the issue is fixed.  We have over 40 flat file imports and we will not spend the money trying to program a solution to this bug for each import.   Just glad we purchased SQL 2005 Dev and tried recreating our DTS packs before we actually purchased SQL 2005.

    To me, the definition of a bug is something that does not work the way it is supposed to. That is NOT the case here. It is working exactly as designed - hence, not a bug. Comparisons with Access and Excel are also irrelevant. Why should SSIS comply with other products whose functionality has been bedded in for ten years or more? Its hardly comparing apples with apples is it?

    Regardless, that does not mean to say that the many concerns raised on here are not valid so please don't flame me for what I've just said. If an enhancement to the functionality is required (i.e. if the design is wrong)then connect.microsoft.com is the place to ask for it. In the meantime Simon has provided example workarounds.

     

    Just FYI, the reason that the functionality has changed from DTS is because of the SSIS team's desire to produce as performant a data pipeline as possible. My understanding is that the functionality herein stems from that. I'm not saying that as an excuse by the way - its not my job to defend Microsoft and if they have dropped a clanger (which it seems they have here) then they should listen to the community.

    Incidentally, the person from MSFT that was originally replying on this thread (Matt David) no longer works on the SSIS team. Given the length of this thread its unlikely that anyone from the SSIS team is going to read it (and Microsoft have never committed to answering anything on these forums) hence if you want to bring it to the attention of the SSIS team, connect.microsoft.com is the correct channel in which to do it. There's alot been written on this thread but I don't think anyone has submitted it at the proper place. have they?

    -Jamie

     

    Thursday, December 07, 2006 8:15 PM
  • Jamie,
     
    Thanks for the reply.  I agree on your definition of a bug.  And comparisons with Access and Excel might not be relevant, but a comparison with definitely DTS is.   DTS handled these unstructured flat  files perfectly (as does Access and Excel).   I suppose the issue is about product expectations.  There should be an option to parse the file out if it is unstructured, again, especially if SSIS predecessor (DTS) functions this way implicitly.  It's nice now to have advanced data flow options, but doesn't make sense to remove functionality (functionality people use) in a new version of a product.
     
    So, maybe not a bug, but definitely poor design (on this matter).
     
    Not including the option in SSIS essentially makes it unfeasible for us to purchase 2005 at this time. (As said in previous post, 40+ unstructured flat files imports with tons of columns from different vendors.)
     
    BTW, for those interested in this thread, we just installed  SQL 2005 Service Pack 2 Community Technology Preview (CTP) version  (SQLServer2005SP2-KB921896-x86-ENU.exe) and this issue still is not resolved.
    Friday, December 08, 2006 12:34 AM
  • Jamie,

     I have submitted this bug to connect when it was first suggested sometime back in this link (https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=126493).  As for workarounds we can all write these, man there is a 100 different ways to solve this. However; this is behaviour is outside of what is expected.  Have a look at the problem then tell me it's not a bug.  Why is the line terminator pulled into the data?  If that's not a bug then I don't know what is.  I'm not sure how one defines getting a line terminator in their data as a "wow wish I’d thought of that, isn't that handy" kind of a moment.  The comparison to 3 other products (Excel, Access, SQL 2000 DTS) are relevant since these have what we call normal behaviour for this process.

    Swan

    MCDBA 2000

    Friday, December 08, 2006 12:39 AM
  •  CSIROSwan wrote:

    Jamie,

     I have submitted this bug to connect when it was first suggested sometime back in this link (https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=126493).  As for workarounds we can all write these, man there is a 100 different ways to solve this. However; this is behaviour is outside of what is expected.  Have a look at the problem then tell me it's not a bug.  Why is the line terminator pulled into the data?  If that's not a bug then I don't know what is.  I'm not sure how one defines getting a line terminator in their data as a "wow wish I’d thought of that, isn't that handy" kind of a moment.  The comparison to 3 other products (Excel, Access, SQL 2000 DTS) are relevant since these have what we call normal behaviour for this process.

    Swan

    MCDBA 2000

     

    Thanks Garry. I've added a comment to the connect submission and I hope everyone else that has been on this thread does the same. The more people that raise this - the more likely that it will get changed.

     

    -Jamie

     

    Friday, December 08, 2006 2:17 AM
  • I've added my comments to this as well.  At least the latest response from MS didn't have another "next version" answer.  Really hoping this might make it into SP3 for SSIS.  The fact that there are so many different workarounds should really speak to the fact that this is not expected behaviour even if it's not technically a bug.  I know I've wondered why my last column was too large quite a few times before remembering that it was an export from Excel and the empty columns were not exported with delimiters on all rows. 

    I do agree that the comparisons with Excel/Access/DTS are valid - especially since Excel causes some of these problems when it exports to a delimited file.
    Saturday, December 09, 2006 1:49 AM
  • There's alot of people commenting on this thread but not many of them have added anything to the Connect posting. Believe me - its more likely to get added in if more people ask for it.

    -Jamie

     

    Saturday, December 09, 2006 5:20 PM
  • I agree, this is a bug. I just started using SSIS. And the first thing that I stumbled upon was this bug. How do I vote in the connect page? Please provide me the link.
    Tuesday, January 16, 2007 4:41 PM
  •  Omkumar wrote:
    I agree, this is a bug. I just started using SSIS. And the first thing that I stumbled upon was this bug. How do I vote in the connect page? Please provide me the link.

    https://connect.microsoft.com/feedback/viewfeedback.aspx?FeedbackID=126493&wa=wsignin1.0&siteid=68

    Click the 'Vote Now' button

     

    Incidentally, only 5 people have voted  - and one of those is me and I don't even care if it gets fixed or not. Its more likely to get changed if you add your comments in the appropriate place. The appropriate place is NOT on this forum.

    Note the following comments from Bob Bojanic, a developer on the SSIS team:

    "Unfortunately, there is no a setting, like you expected, to direct the parsing of flat files to first break rows and then columns.

    The flat file parsing for SSSIS in SQL Server 2005 works on column by column basis, for performance reasons.

    That is something we already had users ask for after the release, and it should definitely be available in the next version of the product. In the meantime people usually use the script component to do this type of parsing – there are a few good posts about it on the SSIS MSDN forum."

     

    -Jamie

     

    • Proposed as answer by lslmustang Tuesday, April 26, 2011 5:27 PM
    • Unproposed as answer by lslmustang Tuesday, April 26, 2011 5:27 PM
    Tuesday, January 16, 2007 4:58 PM
  • I agree, that feature bring us lots of trouble. It works good in 2000 but terriblely wrong in 2005 which scared me away from 2005
    Wednesday, January 17, 2007 7:18 PM
  •  coral bao wrote:
    I agree, that feature bring us lots of trouble. It works good in 2000 but terriblely wrong in 2005 which scared me away from 2005

     

    Have you added your comments to the connect posting? If not, the link is elsewhere in this thread.

    -Jamie

     

    Wednesday, January 17, 2007 7:28 PM
  • Well, there is no "Vote" button, and I'm logged in under Windows Live ID.

    It is one day after the anniversary of the birth of this post and still no solution (at least not the solution many of us are looking for).  I understand that maybe DTS didn't work as it should in the past (guess it was considered "too forgiving").  I get that, I handle data entry for my site and sometimes I have to change the data entry screens to allow only valid scenarios and yes people complain that it doesn't work like before, well, before the changes were made we were open to receiving bogus data entries that served neither my client nor the user.  So I get it, SSIS is better than DTS because it is supposedly faster by ignoring "real world" data scenarios like "uneven delimited columns".  I say "real world" because that is the data we have in this format that is no longer accepted by SSIS.  What's even scarier is that SSIS will not "catch" this problem unless the data loaded in just happens to "break" a rule set by the field that will hold the {CR}{LF} and subsequent row that follows.  Oh, this is very dangerous and "unfriendly" behavior - but as the MS MVP's point out it's not technically a bug.  If it worked in the past (DTS) and now you get garbage in the present (SSIS) after loading in the same exact data file then that garbage you get that is loaded in is (in my opinion) a critical error.  If it does load successfully and the DBA is unaware of it you will not only lose data, but inadvertently allow the user in data row #16 to see another user's data in row #17.

    My suggestion to the SSIS team (if they haven't implemented it already) - add an option to "Check for Row Delimiters before Column Delimiters" with a note that it "may cause slower performance" - seriously for many small-timers like me the difference in nano-seconds is tolerable.  And please make this available as soon as possible, I'd rather not wait another year for the next Service Pack.

    In the meantime another important question we should all be asking ourselves is, "How did we get the data with uneven delimiters in the first place?"  For far too long we have all let this slide with DTS's intuitive and forgiving behavior.  We all need to take better caution when loading in data (even if it was exported from DTS/SSIS).  For many of us we don't have the luxury of dictating the delimiting quality of exports we receive from other parties.  For cases like this, try using the workaround found in other posts (which I refuse to do) or write a simple C# program to read in each row, append missing delimiters as necessary, then write each row to a new file that you will later import (which I would rather do - I'm crazy-fun like that).  So far I haven't found myself needing to fix large files, therefore I personally don't need (at the time of this writing) to create my own C# app to do this for me.  Trust that if I do, I will come back to this post and give you a link to download it.

    Now there is one other way we ended up here.  It is how I ended up here and it is actually the most important thing in my post.  It's "Excel"!!!  Excel is an awesome spreadsheet, but a terrible "CSV Editor".  The MS MVP's will probably argue that it's goal was never to be a CSV Editor or something like that - but come on, my CSV files all have Excel icons and MS sets Excel as the default application to open them.  So here are my gripes about CSV's in Excel:  you lose your text qualifiers (if they're double quotes - I don't know about single quotes), you lose your matching number of delimiters for each line, and Excel auto-formats your string data to numeric data.

    Say I have "01", "02", "KG" for a column, when I export them out of SQL Sever.  I get "01", "02", "KG" when I open up the CSV file using Notepad.  If I open up the file using the default application - Excel, I get:  1, 2, KG.  When I save it in Excel and open it up in Notepad I see this now: 1, 2, KG.  Notice my leading zero's are gone!!!  This is something I'm sure many of us here are already familiar with, but please bear with me.  Notice the double-quotes are also gone!!!   I thought, well, maybe how Excel handles csv isn't to my taste, but I'll try saving it as an Excel spreadsheet and load that in to see if I get any better results.  Try this too, create a new text file and type this in:

    "01"
    "02"
    "KG"

    Save the text file as .csv, then open it up in Excel.  Now save it as a regular Excel spreadsheet (.xls) and try importing it using SSIS's import from Excel feature (I actually had it going to a table where the datatype was varchar(2) and it allowed nulls).  You might not be as shocked as I was to see that the "KG" was entirely missing when "previewing" the imported data - NO WARNINGS, NO ERRORS about LOSS OF DATA - nada!  Just shows 3 rows loaded like we would think it would if all went well.  Scary stuff!!!  Open up the table and there is no 01, just 1 - as for KG, just null.

    Now I'm not trying to load in a go-zillion records where the text-qualifier goes missing or there were missing delimiting-commas on the end of my records.  I'm here in this forum because I was editing tables - very manageable tables ranging from 16 rows to a 1,000.  Editing these tables in SQL Server is a major pain, especially if all you want to do is copy a bunch of rows you already have and then edit the fields you want to create new rows.  "Real World" example here: My tests.  I have tests scheduled for 3 times a year.  They don't vary much during the year, so I would like to open up a file, copy the rows (or tests) from earlier in this year, paste those rows, change a few columns that hold the testing dates and answer keys, then save it and load it back into my Database.

    Okay, maybe that's too much to ask for - maybe it goes against all principals and practices of being a DBA - but if I can get the same results in a shorter amount of time, which do you think I'm gonna do: add row by row in SQL Server and re-type everything in by hand for each of the 100's of new tests???  Or do a little copy and pasting - copy-paste old tests, then update them by copy-pasting the answer key - hey, less human typing equals less human error!  I'm not completely lazy, I do plan in the future to make an interface to easily copy and update tests without editing tables by hand - but the future isn't here yet, so hands and copy-paste are my only options at the moment.  There is nothing I can do about the behavior of Excel - and I was about to lose hope.  I couldn't write my own spreadsheet application to do what I want... but what if someone, or some people already did.  So I searched for other SpreadSheet Apps, and I found "Open Office".  It offers a spreadsheet program called "Calc".  It's popular and it's totally free so what do I have to lose but a few minutes to download and install?  So I installed it on my machine.  Just to avoid any problems I opened up "Calc", went to "Tools"->"Auto Correct" and disabled all the checkboxes for the "Options" tab.  NOTE: Opening up your data in any spreadsheet application is risky because of Auto-Correct!!!  Remember that.  Okay so I went ahead and opened up a CSV file with the "Calc" application and to my delight there was a prompt that showed it understood that there were rows and columns and how I would like to handle them - Do not leave everything as standard (if there is a column that you know has text data but the values can be "promoted" to integer, then select "Text" for those columns - that is what I did in this case).  Looked at the file in "Calc" - saw nothing suspicious.  Then saved it, opened it up in notepad and there were my beautiful Text-Qualifiers (double quotes) and leading zeros!  Hooray!  Plus, If I wanted to add something like "01", I just add an apostrophe beforehand, or I can set the column datatype to text and it knows not to promote to numeric.

    Okay, some of you are thinking, "Oh but Excel does this too, just click on "Data"->"Import External Data" and you will see something similar to what Open Office already does when Open Office opens a CSV file.  Well, it's not the same.  Yes, the advantage you do get with "Import External Data" in Excel is you can set those columns to text.  The columns I am referring to again are the ones with string data that can be promoted to numeric data and corrupt its value (i.e. the "01" to 1 problem).  So yes, this is one way to open it up in Excel and keep those text values fully intact.  However, it does not fix the problem of losing the Text-Qualifiers OR the problem of un-even column delimiters.  For those of you who still don't understand what it is I am talking about, then try it out yourself.  This is an export Straight from DTS/SSIS (the column heading have been changed to protect my IP):

    "BranchID","FarmD","BranchFarmID","BranchType","MinValue","MaxValue","SellerID"
    1,29,30,"BO",8,10,
    2,29,31,"BO",0,7,
    3,30,41,"BO",3,8,
    4,30,31,"BO",0,2,
    5,31,41,"BO",5,6,
    6,31,32,"BO",0,4,
    7,32,33,"BO",4,5,
    8,32,36,"BO",0,3,
    9,33,34,"BO",4,5,
    10,33,36,"BO",0,3,
    11,34,35,"BO",4,5,
    12,34,36,"BO",0,3,
    13,35,36,"BO",0,5,
    14,36,37,"BO",4,5,
    15,36,41,"BO",0,3,
    16,37,38,"BO",4,5,
    17,37,41,"BO",0,3,
    18,38,39,"BO",4,5,
    19,38,41,"BO",0,3,
    20,39,40,"BO",4,5,
    21,39,41,"BO",0,3,
    22,40,41,"BO",0,5,
    23,42,45,"RF",,,74
    24,42,48,"RF",,,74
    25,122,123,"BO",10,12,
    26,122,125,"BO",0,9,
    27,123,124,"BO",3,4,
    28,123,125,"BO",0,2,
    29,125,126,"BO",5,6,
    30,125,129,"BO",0,4,
    31,126,127,"BO",8,10,
    32,126,129,"BO",0,7,
    33,127,128,"BO",3,4,
    34,127,129,"BO",0,2,
    35,130,131,"RF",,,260
    36,130,132,"RF",,,260
    37,130,133,"RF",,,260
    38,69,71,"BO",5,8,
    39,69,70,"BO",0,4,
    40,86,88,"BO",19,20,
    41,86,87,"BO",0,18,
    42,72,75,"RF",,,171
    43,72,75,"RF",,,172
    44,72,78,"RF",,,171
    45,72,78,"RF",,,172
    46,151,154,"RF",,,311
    47,151,152,"RF",,,311
    48,2,3,"BO",0,3,
    49,2,11,"BO",4,10,
    50,3,4,"BO",0,5,
    51,3,11,"BO",6,8,
    52,4,5,"BO",4,5,
    53,4,9,"BO",0,3,
    54,5,6,"BO",4,5,
    55,5,9,"BO",0,3,
    56,6,7,"BO",4,5,
    57,6,9,"BO",0,3,
    58,7,8,"BO",4,5,
    59,7,9,"BO",0,3,
    60,8,9,"BO",0,5,
    61,9,10,"BO",20,26,
    62,9,11,"BO",0,19,

    Now save this data to a text file and mark it as your original.  Make two copies and save it as csv files (one with the name "Saved by Open Office" and the other "Saved by Excel".  Now open up one in Open Office and click save, then close - don't edit anything.  Open up the other in Excel, save and close, and again, don't edit anything - you can also try it using the "Data"->"Import External Data" option (but you will still see the same results).  Open up the csv saved by Open Office and everything is as it should be.  Open up the csv using Excel and see that the Text Qualifiers are gone again and notice that by line 48 your extra "column delimiter" (or comma) drops off the face of the earth for the rest of the records.  Now there isn't anything magical about the number 48 - it starts on 16 on another file I have - it appears random - I don't know how to predict when it will occur, but once I see it in a file, when I reopen the file and save it, it always starts messing up on the same row for that file.

    So there you have it, straight from SQL Server, opened in Excel and the data is IMHO (In My Honest Opinion) corrupted - thought that's a whole other debate, I'm sure an MS MVP will say "it's not corruption", and that "it is by design" - then IMHO I don't like the design.  I hope this long post will help each of you to understand that the problems with the "end result" we are trying to get are multi-faceted, from a lost "friendly handling" of data between DTS and SSIS, to Excel casually "dropping off" Delimiters and Text Qualifiers, to whatever else you happen to stumble upon in your own data adventures.  I only offer two workarounds, for humongo files, write a C# app to fix them - not that hard, but will take 4 hours to a day to write and test.  The other, for smaller files (especially those you want to edit to your content), use Open Office - I would avoid Excel for editing and viewing CSV files.  Oh, and to compare the differences in the original files (humongo and small) after making edits or just checking for changes after opening and saving using different softwares; I suggest a cool little app from Scooter Software called Beyond Compare 2.  It's awesome and free to try - plus you will love it - especially when coding and something breaks, you can use this tool to compare the working version of your code to the broken version to see what files were changed and what rows were edited to help you debug - I've been using it for 6 years now and it's saved my butt countless times.  Especially when I have a 25 MB data file and I want to compare it to a 25 MB data file from the day before.

    Also, sorry for the post, I'm long-winded, but I like to be as detailed as possible, especially for newbies.

     

    Sunday, January 21, 2007 5:33 PM
  •  Miguel_TX wrote:

    Well, there is no "Vote" button, and I'm logged in under Windows Live ID.



    There is for me, inside the "Ratings" box. However, the bug submission is marked as closed.

    Sunday, January 21, 2007 7:04 PM
  • It is closed because MS say that is the way it has been designed.

    You need to vote to get MS to do something about it. If you don't vote they won't.

    If you look at the feedback on scripting and what has been put in SP2 you will see that giving your feedback will make things happen

    Monday, January 22, 2007 11:48 PM
  • I'd still add your vote and whatever comments you feel are necessary.  I guess a new bug submission could be entered, but it's likely to be closed as it's a "feature" to MS.  Pointing back to this thread saying that workarounds exist is just denying that there's a problem.  Sure we can write script and code all day long to get around MS' shortcomings, but that defeats the purpose of using a product that's designed to make our lives easier.

    As I noted in an earlier post, part of this is caused by the fact the Excel causes this problem when you save it out to a CSV file.  (And you can't trust Excel directly because the JET provider for it interprets datatypes of the various columns so poorly.)


    Anyone who is still reading this thread and cares about this needs to add their votes and comments to the MS Connect site:
    https://connect.microsoft.com/feedback/viewfeedback.aspx?FeedbackID=126493

    It's the only way we might get a fix prior to their next major release of SQL Server.
    -Pete
    Friday, February 02, 2007 9:02 PM
  •  paschott wrote:


    It's the only way we might get a fix prior to their next major release of SQL Server.
    -Pete

    I wouldn't guarantee you'll get one in that either, forget about prior to it 

     

    Friday, February 02, 2007 9:14 PM
  • As yet the problem remains. I have voted even though the problem is allegedly long closed. Why do we pay so much for the software if we cannot get issues fixed. It still surprises me that much used facilities are dropped all that happens is somebody says fix it youself.
    Thursday, June 07, 2007 1:34 PM
  •  Madhattuer wrote:
    As yet the problem remains. I have voted even though the problem is allegedly long closed. Why do we pay so much for the software if we cannot get issues fixed. It still surprises me that much used facilities are dropped all that happens is somebody says fix it youself.


    How do you know we aren't getting this fixed?  Have you looked at the June CTP of SQL Server 2008 yet to see if this is still the case?  Give Microsoft some credit, please.
    Thursday, June 07, 2007 1:59 PM
  • Microsoft have produced a much better IDE for SSIS which is much better than the one for DTS with modifications to the package working through neatly. All in all seeing the progress colour coded makes for an easier development process.

     

    In the past I have had experience of areas which used to work being changed and then not being corrected on notification. Unfortunately there are always to many jobs to do in the available time.

     

    Unfortunately, looking at the rest of this thread, I am not the only one who has negative feelings about changes.

     

    I have had a quick look at the overview for 2008 (basic text search for SSIS) and could only find generic statements with no specifics mentioned.

     

    Looking at the feedback reference the topic had been closed a long time ago with no comment as to a fix being worked upon. The manner of the closing seemed to suggest otherwise, a feeling also shared by comments found on the thread.

     

    Obviously by the response I have stirred this topic is emotive in both camps.

     

    Unfortunately we have, in the past, been left in the position that a bug was not going to be fixed (current up to Office 2003), have yet to check 2007. A bug which did not exist in version 97. This left us in an awkward position with our customers when they encounter the problem as there was no work around available.

    Thursday, June 07, 2007 2:45 PM
  • Madhattuer,
    Did you read Bob's response that Jamie included in his post above?  Bob thinks it should be in the next release.  That would be SQL Server 2008. 

    Don't expect the documents to detail everything they are working on in the next version.
    Thursday, June 07, 2007 2:50 PM
  • I must admit I can't find that one. Oh well it's been a long day. Thanks for the info, but for now I better find a work around.
    Thursday, June 07, 2007 3:01 PM
  •  Madhattuer wrote:
    I must admit I can't find that one. Oh well it's been a long day. Thanks for the info, but for now I better find a work around.


    It's the third post from the top on page 3...  Let me include it here for you:

     Jamie Thomson wrote:

    Note the following comments from Bob Bojanic, a developer on the SSIS team:

    "Unfortunately, there is no a setting, like you expected, to direct the parsing of flat files to first break rows and then columns.

    The flat file parsing for SSSIS in SQL Server 2005 works on column by column basis, for performance reasons.

    That is something we already had users ask for after the release, and it should definitely be available in the next version of the product. In the meantime people usually use the script component to do this type of parsing – there are a few good posts about it on the SSIS MSDN forum."

     

    -Jamie

     



    Thursday, June 07, 2007 3:05 PM
  • To anyone still reading this thread....

     

    I have written a summation of this issue here:

     

    Flat File Connection Manager issues

    (http://blogs.conchango.com/jamiethomson/archive/2007/05/15/SSIS_3A00_--Flat-File-Connection-Manager-issues.aspx)

     

    It may help anyone that is vexed by this. Also, take a note of the comments at the end, they make for interesting reading.

     

    -Jamie

     

     

    Thursday, June 07, 2007 3:16 PM
  • Must be one of my tired days. Thanks, must have got knocked bandy by the big email from Miguel.

     

    Thanks again

     

    Thursday, June 07, 2007 3:23 PM
  • I too still can't believe this hasn't been fixed.  It falls under a bug since the line terminator is pull into the data, simple.  It's just a fundamental error.  I now have instructed my team to call DTS packages from SSIS to pull in uneven column widths.  You can still install DTS on a SQL2005 machine as a stand alone app.  I am just bewildered as to how you get bugs fixed with Microsoft.  I am a huge fan of Microsoft however this has left me thinking that getting them to listen is a lot harder than they make out at the annual TechEd events.

     

    I’m going to re-open this problem and see if they close it again.

     

    Wish me luck!

    Thursday, June 07, 2007 11:02 PM
  • Gerry,

    Good luck! I think you'll need it

     

    We're not still having this same argument surely

     

    They WILL close it because in their eyes (and I know you don't agree) it is not a bug. Bob Bojanic has already stated that they are looking at this for katmai. If they don't provide something in katmai that appeases everyone then you can complain long and hard and I shall stand shoulder-to-shoulder with you and be very very critical.

     

    I know you will answer that they could have fixed it in the service packs but look at it from their point of view. Should they fix bugs (and remember, they don't think this is a  bug) or should they provide different ways of doing things that are currently very very achievable (and, may I add, pretty easy)? Hopefully everyone would say they should be fixing bugs.

     

    Read the comments on my blog post. The majority of people commenting (admittedly there isn't many of them) don't think this is a bug.

     

    Please don't flame me. I know you don't agree, I'm just trying to provide the alternative (i.e. Microsoft's) point-of-view.

     

    Regards

    Jamie

     

     

    Thursday, June 07, 2007 11:27 PM
  • Well you know arguments they can just run and run. Do you know if Simon is still around. Have attempted twice to get his example from  http://sqlblogcasts.com/files/4/integration_services/entry412.aspx but it won't let me join to download the example. Trying to remedy by pulling in as one column and then spliting using scripting however can't seem to access column data which insists that it is a blob? Still searching for info on this but with little joy so far.
    Monday, June 11, 2007 1:58 PM
  •  Madhattuer wrote:
    Well you know arguments they can just run and run. Do you know if Simon is still around. Have attempted twice to get his example from http://sqlblogcasts.com/files/4/integration_services/entry412.aspx but it won't let me join to download the example. Trying to remedy by pulling in as one column and then spliting using scripting however can't seem to access column data which insists that it is a blob? Still searching for info on this but with little joy so far.


    This is a result of either mismatched data types coming into your script, or your syntax when trying to write your script.  See your other thread for details.  We're not going to discuss your blob issue in this thread.  Wink
    Monday, June 11, 2007 2:01 PM
  • Talk about a watched kettle never boils. I left my computer to get a drink and my login details arrived.

     

    Having a quick look at Simons example his output column from the flat file was wstr and mine was set to Text.  Setting to DSTR seems to cover the problem in this case.

     

    Monday, June 11, 2007 2:32 PM
  • But Mat,

      In this case how do we deal with the extra column values(In rows of varying size) that are getting added into a new row. I have few Columns are DT_TEXT types having "new line" delimiters in its content, which simply adds more rows. So how should I deal this using script component.

     

    Thanks

    Subhash Subramanyam

    Monday, June 18, 2007 7:40 AM
  • ok since you are still complaining about this here is the fix I wrote.. a c program which fixes the input file so that it has the correct fixed number of columns.. you tell it how many columns it should haved and what the string delimiter and column deliminter are and it pads or strips columns off as necessary..

    ie

     

    C:\utils>cat 1.csv
    1,2,3,4,"now is the","time"

     

    C:\utils>fixinput 1.csv 2.csv 8 "," """ 0

    C:\utils>cat 2.csv
    1,2,3,4,"now is the","time",,

    C:\utils>

     

     

    syntax:

     

    C:\utils>fixinput
    FixInput (c) 2006 larry kahn - icentrix.com (kahn@lgk.com)

    syntax: inputfile outputfile columns terminator delimiter debug

    Note to pass a quote or command put it in quotes ie: fixinput infilename.csv out
    filename.csv 10 "," """ 1
         to pass a tab character as the terminator pass T ie: fixinput infilename.tx
    t outfilename.txt "T" "" 1

     

    download here:

     

    http://stage1.icentrix.com/fixinput.exe

     

    Friday, June 22, 2007 1:14 AM
  •  Madhattuer wrote:
    As yet the problem remains. I have voted even though the problem is allegedly long closed. Why do we pay so much for the software if we cannot get issues fixed. It still surprises me that much used facilities are dropped all that happens is somebody says fix it youself.


    How do you know we aren't getting this fixed?  Have you looked at the June CTP of SQL Server 2008 yet to see if this is still the case?  Give Microsoft some credit, please.


    What the heck??? 2008? So your solution for all these people is to purchase yet another version of the software to get back functionality that worked in SQL 2000 DTS? That sir, is B.S. ... and typical Microsoft.
    Monday, August 24, 2009 4:06 PM
  • My problem seems very close to this.  I have an Excel spreadsheet, with multiple columns.  Some lines have more columns than others.  I want to export the data in each column, into a text file, such that when the export function comes across a new column, it generates a newline character.

    Tes1   test2   test3   test4   test5
    testa   testb   testc   testd   teste   testf


    becomes

    Test1
    test2
    test3
    test4
    test5

    testa
    testb
    testc
    testd
    teste

    and so on.  Splitting the columnar layout of the current export file is tedious in the extreme.  My editor (LPEX) seems to say it can do this with a series of wrap commands, but I can't get that to work at the moment, and it too will be time-consuming.

    Any suggestions ?  I am not a trained programmer, so please, any suggestions need to be explained in simple terms.  This is my very first attempt at asking a question of Microsoft.
    Tuesday, September 15, 2009 8:42 AM
  • Well it don't work in 2008 either.

    I've hit the same problem.

    Monday, April 19, 2010 10:14 AM
  • Hi Jamie,

    I've implemented this example and it works like a charm, but now I'm running into an issue with qualified text.  Text that comes in as "Adams, Sam" is still getting split into two cells even though I have specified the qualified text (") in the import component.  I'm guessing the script is overriding this..

    I'm wondering if there's a way to modify your script to handle these kinds of exceptions?  So if text comes in within quotes, everything in the quotes goes into one cell...including any commas.

    Thanks so much,

    Josh

    Tuesday, May 18, 2010 5:36 PM
  • Hi Jamie,

    I've implemented this example and it works like a charm, but now I'm running into an issue with qualified text.  Text that comes in as "Adams, Sam" is still getting split into two cells even though I have specified the qualified text (") in the import component.  I'm guessing the script is overriding this..

    I'm wondering if there's a way to modify your script to handle these kinds of exceptions?  So if text comes in within quotes, everything in the quotes goes into one cell...including any commas.

    Thanks so much,

    Josh


    Hi Josh,

    I'm not sure which of my scripts you're referring to. I've reviewed this thread and don't see any links to any scripts that I have posted anywhere (if I missed it I apologise, its a long, old thread)

    Regardless, sounds like you have a different issue to the one talked about here so you may want to raise it as a new thread.

    -Jamie


    http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Wednesday, May 19, 2010 1:55 PM
  • you provided this link in this thread on july 18 2006

    http://consultingblogs.emc.com/jamiethomson/archive/2006/07/14/4226.aspx

    it's this script that's giving me issues with qualified text.

    hope this helps!

    Josh

    Wednesday, May 19, 2010 2:06 PM
  • you provided this link in this thread on july 18 2006

    http://consultingblogs.emc.com/jamiethomson/archive/2006/07/14/4226.aspx

    it's this script that's giving me issues with qualified text.

    hope this helps!

    Josh


    Ak OK. I'm sure there is a way to adapt it to cope with that sort of situation but .Net development isn't really my forte. The script is provided "as is" in the hope that it provides a starting point for people needing to accomplish something similar - I'm sure you can find help on the many .Net forums that exist if you need to modify it.

    I suspect that the String Class and its members would be a good starting point too.


    http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Wednesday, May 19, 2010 6:57 PM
  • I know this is an old post, and I'm just facing the problem now, not that I'm going to receive mal formed files, but just to have a quality check, I want to make sure each of my lines has 200 characters, and then a newline.

    This simple check shouldn't have taken more than 5 minutes, here, it is taking me a lot of time, as it already took a lot of time for many of you.

    I have just one word : Thank you Microsoft for thinking differently !


    BizTalk Consultant in France
    Monday, March 28, 2011 4:18 PM
  • I also am experiencing this as well. We want to convert some FoxPro imports to SSIS. SQL Server 2008 RC2 doesn't do this yet.... Makes it hard to convince people that moving to SSIS from Foxpro is better....

    Tuesday, April 26, 2011 5:29 PM
  • What about for fixed column files that have 2 columns at the right that may or may not be there...

    This does not address this issue at all.

    There is a LF at the end of each row, which should be read as the end of the data for that record.  Why is this so difficult to have in SSIS?

    Tuesday, April 26, 2011 5:44 PM
  • Why is this so difficult to have in SSIS?

    it's hard to import this:

    "test1";"abc";"xyz";"123"
    "test2";"cba";"zyx";"321"
    "test3";"abc"
    "test4";"efg";"zyx"
    "test5";"cba";"zyx";"321"

    because it's not a correct csv file.

    But with a little scripting it's possible:

    http://microsoft-ssis.blogspot.com/2011/02/script-component-as-source.html


    Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com
    Tuesday, April 26, 2011 7:27 PM
  • But what about Fixed Length columns without scripting. I'm a DBA not a programer and need to import a number of these kinds of files.

    No delimeter in the file except for <LR> at end of row.

    File has 6-8 columns per row.  It's columns 7 and 8 that are optional and the <LR> is after the 6th column if the last two columns are not present. This should be EASY. End the record with the <LR>.

     

    Tuesday, April 26, 2011 7:50 PM