locked
Ignore first 6 rows in Excel import in SSIS pkg. RRS feed

  • Question

  • I have an SSIS package that imports from an Excel file with data beginning in the 7th row.

    Unlike the same operation with a csv file ('Header Rows to Skip' in Connection Manager Editor), I can't seem to find a way to ignore the first 6 rows of an Excel file connection.

    I'm guessing the answer might be in one of the Data Flow Transformation objects, but I'm not very familiar with them.

    Any pointers would be greatly appreciated.
    Eric
    Tuesday, July 3, 2007 4:58 PM

Answers

  • rbhro, actually there were 2 fields in the upper 5 rows that had some data that I think prevented the importer from ignoring those rows completely.

    Anyway, I did find a solution to my problem.

    In my Excel source object, I used 'SQL Command' as the 'Data Access Mode' (it's drop down when you double-click the Excel Source object).  From there I was able to build a query ('Build Query' button) that only grabbed records I needed.  Something like this:
    SELECT     F4, F5, F6
    FROM         [Spreadsheet$]
    WHERE     (F4 IS NOT NULL) AND (F4 <> 'TheHeaderFieldName')

    Note: I initially tried an ISNUMERIC instead of 'IS NOT NULL', but that wasn't supported for some reason.

    In my particular case, I was only interested in rows where F4 wasn't NULL (and fortunately F4 didn't containing any junk in the first 5 rows). I could skip the whole header row (row 6) with the 2nd WHERE clause.

    So that cleaned up my data source perfectly.  All I needed to do now was add a Data Conversion object in between the source and destination (everything needed to be converted from unicode in the spreadsheet), and it worked.

    Thanks for everyone's input!

    Eric
    Thursday, July 12, 2007 9:48 PM

All replies

  • You could add a row number to your data, and use a conditional split to remove the first 6 rows.

     

    To add the row number, you can use a script component or the Row Number transform from http://www.sqlis.com/93.aspx. Then add a conditional split that checks to see if the row number is less than 7 and send those rows to a different output, which you should not connect to any other tasks, so that the rows are discarded.

    Tuesday, July 3, 2007 8:47 PM
  • You may consider creating a named range in your Excel sheet with data you want to import and selecting that table in Excel source.

     

    HTH,

    Bob

    Wednesday, July 4, 2007 12:37 AM
  • What is in the first 6 rows? Where are the column headers?
    Wednesday, July 4, 2007 11:18 AM
  • Thanks for everyone's replies. 

    The first 5 rows are empty and the 6th row has the table header.  I was playing around with
    'conditional split' (based off one of the integer columns), but my priorities at work just shifted, so I'm going to have to get back to this at a later date.

    Bob, the data is auto-generated, so I have to consume it as is.

    I'll post back after I've returned to this problem and tinkered w/everyone's ideas.


    Thanks again,
    Eric
    Friday, July 6, 2007 6:31 PM
  • To be honest I've tryed to replicate your problem. Let me see if i got it straight :

    5 empty rows followed by a row with column headers followed by data. That was the excel file.

    After i used an excel source i used an data viewer and it showed that the empty rows were skipped by default.

    Tuesday, July 10, 2007 9:10 AM
  • I have a similar situation, but mine differs in that the first 5-6, or so, rows contain non-essential information (report name, legend, etc) which I need to bypass in order to get to the header row.  I'm somewhat new to SSIS, so I'm not sure if I'm looking at the right control flow object, or not, but I was looking at the Foreach Loop...and, then trying to determine if one of the Foreach Loop enumerators would allow me to sequentially process a worksheet in an Excel file so that I could interogate several columns to try to determine if the row is my header row, or data rows, or just the non-essential information.  Can you tell me if looking in the right direction?

    Thanks for any help.

    Wednesday, July 11, 2007 12:28 PM
  • The ForEach loop will not let you directly iterate through the rows in a spreadsheet. You could retrieve them into a recordset, and then iterate that with a ForEach. You also might want to refer to some of the earlier posts in this thread - counting the rows or using a named range.
    Thursday, July 12, 2007 3:09 AM
  • Thanks.  I tried that yesterday.  First, I defined a data-flow with an Excel Source being my Excel file, a Data Conversion to make sure the string columns were defined as string, and then a Recordset Destination.  Then in my Control Flow I have my Data Flow task input into a ForEach loop to iterate thru the rows (I inserted a Script Task to load a variable with just the first column of each row so that I could display it with a messagebox).  This seems to run ok until I encounter a row with a null value in one of my string columns, at which point I get the error (Error: The type of the value being assigned to variable "User::uDescription" differs from the current variable type. Variables may not change type during execution. Variable types are strict, except for variables of type Object. )  So, this is partially working.  In my Data Conversion I've tried defining the string columns as either "string [DT_STR]" and "unicode string [DT_W(?can't see the entire data type description)".  Is there a recommended way of dealing with nulls in data columns?

    Thanks again...this is making some sense...there just seems to be many options, or possible ways of transforming the data.

    Thursday, July 12, 2007 12:59 PM
  • rbhro, actually there were 2 fields in the upper 5 rows that had some data that I think prevented the importer from ignoring those rows completely.

    Anyway, I did find a solution to my problem.

    In my Excel source object, I used 'SQL Command' as the 'Data Access Mode' (it's drop down when you double-click the Excel Source object).  From there I was able to build a query ('Build Query' button) that only grabbed records I needed.  Something like this:
    SELECT     F4, F5, F6
    FROM         [Spreadsheet$]
    WHERE     (F4 IS NOT NULL) AND (F4 <> 'TheHeaderFieldName')

    Note: I initially tried an ISNUMERIC instead of 'IS NOT NULL', but that wasn't supported for some reason.

    In my particular case, I was only interested in rows where F4 wasn't NULL (and fortunately F4 didn't containing any junk in the first 5 rows). I could skip the whole header row (row 6) with the 2nd WHERE clause.

    So that cleaned up my data source perfectly.  All I needed to do now was add a Data Conversion object in between the source and destination (everything needed to be converted from unicode in the spreadsheet), and it worked.

    Thanks for everyone's input!

    Eric
    Thursday, July 12, 2007 9:48 PM
  • Friggle, this worked for me as well.  Thanks for posting your solution.
    Friday, July 13, 2007 10:53 AM
  • In Mike Bosch's blog he listed a good solution as well.

    The OpenRowset property on the Excel Datasource component properties page.

     

    I had three rows of descriptive crud at the top, a ton of "touched" rows that didn't read as null at the end and only four columns out of 9 I actually needed.

    This tip helped me filter out the area I needed using: 

    SheetName$A4Big Smile635

     

    Here's Mike's post

    http://weblogs.asp.net/mikebosch/archive/2007/10/26/ssis-skipping-rows-and-stripping-subtotals.aspx

     

     

    Friday, May 16, 2008 7:43 PM
  • This is a very old thread, but it was helpful in pointing out Mike Bosch's post. The current link to his post is http://mikebosch.net/2007/10/ssis-skipping-rows-and-stripping-subtotals-in-excel/

    The main trick is to change the "OpenRowset" component property in the Advanced Editor of the Excel file source. Instead of having the value be something like "Sheet1Name$" it can be "Sheet1Name$A3:ZZ1000" to indicate that the data starts at the third line, for example.

    Wednesday, March 22, 2017 3:59 PM