SSIS excel data source numeric values returned as null
-
Friday, May 08, 2009 10:30 PMHello everyone, long time no see!, I have a problem and was hoping someone can help me:
I'm using SSIS 2005 Enterprise edition, I'm creating a package that reads an excel (xls) file using the "excel source" component, and it dumps the data into an OLEDB destination (a sql server).
When I drag the excel source component and create the excel connection to my file the component automatically reads the columns and their datatypes.
The problem is that I have a column which has numeric data and the package uploads as NULL every number that starts with a zero. (note: in excel this column is formatted as "text", despite it has only numbers, because it's the only way excel maintains the left sided zeros).
So I checked the data types by right clicking the excel source component -> show advanced editor and my surprise is that this column's data type is detected as double-precision float, and it doesn't let me change it.
I tried the answer posted here: http://devselekta.blogspot.com/2007/09/ssis-excel-data-source-values-returned.html but it only works when the first row of data has a number beginning with zero on this column.
Someone knows how to get the data imported correctly? Thanks in advance.
Odin_Dark
Answers
-
Monday, May 11, 2009 9:54 PMModerator
You might want to check out this page in Books Online: http://msdn.microsoft.com/en-us/library/ms141683.aspx
You've nailed the problem - the Excel driver is scanning the first few rows to determine the type of the column. If the data has no leading zeroes, it interprets it as numeric, and then NULLs out values that have leading zeroes because these appear as text to the driver. Excel defaults to only using the first 8 rows of data to do this guessing. You can increase the number of rows it uses by modifying the TypeGuessRows key in the registry under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel.
Hope this helps.
John Welch | www.mariner-usa.com | www.agilebi.com | ssisUnit.codeplex.com- Marked As Answer by Odin_dark Monday, May 11, 2009 10:46 PM
All Replies
-
Saturday, May 09, 2009 7:13 AMHi,
Use DataConversion tool. The Data Conversion transformation converts the data in an input column to a different data type and then copies it to a new output column.
Change datatype of that filed to "DTS_Str" using DataConversion.
So now your order is change in DataFlowTask-
1. ExcelSource
2. DataConversion
3. OLEDB Destination
Goto this url for more details: http://msdn.microsoft.com/en-us/library/ms141706.aspx
Kapil Khalas -
Monday, May 11, 2009 3:10 PMHi Kapil, first of all thanks for your response.
Unfortunately I believe that the problem is on the Excel Source component, because when I add it and create the connection there is a "preview" button, and there all numeric values that start with zero are previewed as null, which means that the problem is that the excel source component reads the values as null from the very beggining, so when the data arrives to the DataConversion this data is already null.
I tried your suggestion and it didn't work, It's still giving null values.
I appreciate your response, thanks in advance!
Odin_Dark -
Monday, May 11, 2009 3:48 PMI was able to view data from excel using Excel Source as expected. The excel columns were type text with values having leading zeroes. The only thing I can thing of is that you have the incorrect version of Excel selected in the connection manager.
-Will -
Monday, May 11, 2009 8:47 PMThanks wmlpez,
I'm connecting to an excel 2003 file (.xls) using the following connection string:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\MyExcel.xls;Extended Properties="Excel 8.0;HDR=Yes;IMEX=1";
The curious thing is that, if the first rows of data begin with zero, the values are uploaded correctly, but if the first rows of data does NOT have zeroes, then the data with zeroes at beginning are uplaoded as null, example:
.....this is uploaded correclty (no nulls because first rows have zeroes at beginning)
Row # | Numeric Data
------------------------------
1 | 01548
2 | 00157
3 | 78996
4 | 73956
5 | 01547
....this is uploaded INCORRECTLY (null values are read where zeroes are at beginning)
Row # | Numeric Data
------------------------------
1 | 81548
2 | 89157
3 | 78996
4 | 48956
5 | 71547
6 | 91547
7 | 11548
8 | 91547
9 | 41547
10 | 71543
11 | 01547
12 | 00595
It looks like SSIS is only reading data correctly when first N rows of data have zero at beginning....
Do you know what could be possibly happenning? If you can have the right behavior, can you send me an example of how you do it? (or a zipped project?)
Thanks in advance!
Odin_Dark -
Monday, May 11, 2009 9:54 PMModerator
You might want to check out this page in Books Online: http://msdn.microsoft.com/en-us/library/ms141683.aspx
You've nailed the problem - the Excel driver is scanning the first few rows to determine the type of the column. If the data has no leading zeroes, it interprets it as numeric, and then NULLs out values that have leading zeroes because these appear as text to the driver. Excel defaults to only using the first 8 rows of data to do this guessing. You can increase the number of rows it uses by modifying the TypeGuessRows key in the registry under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel.
Hope this helps.
John Welch | www.mariner-usa.com | www.agilebi.com | ssisUnit.codeplex.com- Marked As Answer by Odin_dark Monday, May 11, 2009 10:46 PM
-
Monday, May 11, 2009 10:46 PMHello jwelch,
Thanks a lot for your response, in fact that's the problem, and now I understand how IMEX & guessrows works, since I don't really like to configure the registry on the server because maintenance will be a little complicated (and guessing 65000 rows to assure finding the data type will be very slow), I did the following:
1-I set the IMEX=1 property on my connection string on the connection manager
2-On the connection manager, I unchecked the "first row has column names" option (since my titles will always be text, the driver will recognize text
always and treat all data as such)
3-I remove manually (sql query) the first row of data since it has only the column names.
4-I do manually any data type conversion I need.
By the way, I don't really understand why microsoft always try to guess data types, instead of letting us choose which is the data type of the column on the excel source component and respect that..... I really really don't see why guessing is the only way...
If I could put a suggestion to microsoft I would suggest that.
Thanks a lot for your amazing support guys!
Odin_Dark -
Tuesday, May 12, 2009 2:27 PMModeratorMy understanding of the issue is that Excel really doesn't have typed data, not in the same way as a database does. The JET driver that is used to query Excel does it's own guessing about the types, as described above. The SSIS team's not really involved in that - a different group manages the JET provider. And since they are not really planning any updates to it, I doubt we'll see anything different in the near term.
John Welch | www.mariner-usa.com | www.agilebi.com | ssisUnit.codeplex.com -
Wednesday, October 20, 2010 6:10 PM
Coonection manager does not let me change the properties but i changed expresion like this
"Provider=Microsoft.Jet.OLEDB.4.0; Data Source=" + @[User::FileName] + ";Extended Properties=\"EXCEL 8.0;HDR=YES;IMEX=1\";"
Column which has datalike this 23456A it work fine but column which has 123456 it show like 2.01375e+006 but actual number is like this 2012449.
Please help me to solve this issue ASAP
-
Tuesday, September 27, 2011 10:16 PM
please check the following link.
http://support.microsoft.com/kb/194124/en-us
Set Db = OpenDatabase("C:\Temp\Book1.xls", _ False, True, "Excel 8.0; HDR=NO; IMEX=1;")
Regards, Ahmad Elayyan

