none
OleDbConnection only finds cell value when workbook is also open in Excel RRS feed

  • Question


  • I have a program (actually SSIS script task, but I don't suppose that matters) that creates an OLE DB connection to an Excel workbook, and reads the cell values in each worksheet, storing them in a SQL Server table.

    Each worksheet has several sections of rows, each section being for a separate product. The first two rows of each product section are a quarter row, and a year row. So, something like this:

    worksheet yr and quarter rows

    I use an OleDbDataReader with a "Select *" command to read the data in each sheet into a DataTable. I have a column called "YearQuarter" in my SQL database, where I store a concatenation of the year row value and the preceding quarter row value, with a hyphen between the two strings:

    enter image description here

    Looking at the output data, I noticed that I was getting inconsistent results. Some rows would have a YearQuarter column value that would have only the Year row value in them, while others would have the cell values from both rows. For example, I'd have "2009 - Year End" followed by just "2010", with no " - 1st Qtr." appended to it.  (Notice that, in this case, the column separator is darker, indicating an Excel "frozen column". I think that this is just a coincidence, however.)

    In my program, this would mean that the Quarter cell value for that column appeared to the data reader to be empty.

    However, I noticed that it wasn't consistent. Sometimes I'd run my package and the same row would now have the full value. So, in the above example, I'd get "2010 - 1st Qtr."

    I finally realized that it was working as expected only if I happened to have the workbook open in Excel at the same time that the program was running!

    Why would this make a difference? Could it be that there is a macro or something in the workbook that is executed by Excel, but not when the workbook is accessed only via an OLE DB connection? Would the fact that it had been executed in Excel then affect the data obtained by OLE DB? If that's the case, how do I get around this? The spreadsheets are provided to me. So I can't modify them.



    - Mark Z.


    • Edited by Mark Zudeck Friday, September 12, 2014 7:49 PM
    Friday, September 12, 2014 7:42 PM

Answers

  • @cnk_gr - This bug turns out to be a "feature", and it should come with a big warning sign.

    I just came across this article, which explains that "ADO.NET scans the first 8 rows of data, and based on that, guesses the datatype for each column. Then it attempts to coerce all data from that column to that datatype, returning NULL whenever the coercion fails!"

    In other words, it is treating the worksheet as a relation table, in which all values in a given column are of the same type.  Of course, worksheet data is not bound by this restriction.

    This behavior can be gotten around by modifying these registry settings:

    Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/ImportMixedTypes

    Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/Typ

    I think this was a very risky design, inviting data transfer errors that could easily go unnoticed.


    - Mark Z.


    • Marked as answer by Mark Zudeck Wednesday, September 17, 2014 6:26 PM
    • Edited by Mark Zudeck Wednesday, September 17, 2014 6:33 PM
    Wednesday, September 17, 2014 6:25 PM

All replies

  • I have seen several problems with SSIS and computed (formula) fields in excel. Interop in script tasks work better
    Friday, September 12, 2014 7:50 PM
  • The cells in question don't appear to be computed.  For instance, if I select the cell containing "1st Qtr." in the example, the fx edit box in Excel just displays "1st Qtr."

    - Mark Z.

    Friday, September 12, 2014 7:54 PM
  • Are there any hidden rows in your excel files ?
    Friday, September 12, 2014 8:02 PM
  • Not as far as I can tell.  If I go into "Find & Select" and choose "Visible cells only", I don't see any  white borders that would indicate hidden ones.

    - Mark Z.


    • Edited by Mark Zudeck Friday, September 12, 2014 8:37 PM
    Friday, September 12, 2014 8:37 PM
  • I'd try to load the same excel file in a script task in c# or vb using excel intrerop and then loading the data into a temporary table
    Friday, September 12, 2014 8:48 PM
  • Hi,

    In fact, using OleDb Connection doesn't require the workbook to be opened when connecting to Excel workbook.

    After testing with the code below, it works well to read data from Excel workbook with OleDb connection. I suggest you checking your connection string. If your are working with *.xlsx file, it's better to set the Provider as "Microsoft.ACE.OLEDB.12.0;" and Extended Properties as "Excel 12.0 Xml;HDR=YES;IMEX=1;"

    private void TestOleDb()
    {
        string connetionString = null;
        OleDbConnection oledbCnn;
        OleDbCommand oledbCmd;
        string sql = null;
    
        string path = @"C:\Documents\TableMatch.xlsx"; //for xlsx file
        //string path = @"C:\Documents\TableMatch.xls"; //for xls file
    
        connetionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + path + ";Mode=Read;Extended Properties=\"Excel 12.0 Xml;HDR=YES;IMEX=1;\""; //for xlsx file
        //connetionString = "provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + path + ";Extended Properties=Excel 8.0;"; //for xls file
    
        sql = "SELECT * FROM [Sheet2$A1:I8]";
    
        oledbCnn = new OleDbConnection(connetionString);
        try
        {
            oledbCnn.Open();
            oledbCmd = new OleDbCommand(sql, oledbCnn);
            OleDbDataReader oledbReader = oledbCmd.ExecuteReader();
            while (oledbReader.Read())
            {
                MessageBox.Show(oledbReader.GetValue(0) + " - " + oledbReader.GetValue(1) + " - " + oledbReader.GetValue(2));
            }
            oledbReader.Close();
            oledbCmd.Dispose();
            oledbCnn.Close();
        }
        catch (Exception ex)
        {
            MessageBox.Show("Can not open connection ! ");
        }
    
    }

    If all these cannot help, I suggest you sharing your sample code for us to troubleshoot.


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Monday, September 15, 2014 8:55 AM
    Moderator
  • Hi Luna,

    My connection string to the provider is as you suggest, except that I have "HDR=No", because I don't want the header information from the sheets.  The file is of type .XLS.  I have tried changing extended property "Excel 12.0" to "Excel 8.0".  (Recommended for .xls files, I believe.)  I also tried saving the file as .XLSX and using extended property "Excel 12.0".  These changes did not solve the problem.

    My code is like this:

              OleDbConnection oleExcelConnection = new OleDbConnection(
                    "Provider=Microsoft.ACE.OLEDB.12.0;" +
                    "Data Source=" + strWkbkFilePath + ";" +
                    "Mode=Read;" +
                    "Extended Properties=\"Excel 8.0;HDR=No;IMEX=1\"");

               oleExcelConnection.Open();

               DataTable dtCurrSheet = new DataTable();

               // Name of table is in strLoadTblNm.

                OleDbCommand oleExcelCommand;
                OleDbDataReader oleExcelReader;

                oleExcelCommand = excel_conn.CreateCommand();
                oleExcelCommand.CommandText = "Select * From [" + strLoadTblNm + "]";
                oleExcelCommand.CommandType = CommandType.Text;
                oleExcelReader = oleExcelCommand.ExecuteReader();

                // Load worksheet into data table
                dtSheet.Load(oleExcelReader);

                oleExcelReader.Close();

    For most cells, the data is properly extracted into the DataTable.  However, there are some that are not obtained.  For instance, I have a worksheet like this:

    Notice that the first column with a year value of "2010" has "1st Qtr." in the cell above it.  Here is a screen shot of what I see in the DataSet Visualizer in Visual Studio:

    That "1st Qtr." value has not been loaded into the DataTable, though the quarter values in the columns before and after it were loaded.  This seems like a bug in OleDb or SSIS.


    - Mark Z.




    • Edited by Mark Zudeck Monday, September 15, 2014 11:51 PM
    Monday, September 15, 2014 9:28 PM
  • Can you save that excel to csv and see how it looks ?
    Monday, September 15, 2014 9:33 PM
  • @cnk_gr - Saved .csv preserves all quarter row values.

    - Mark Z.



    • Edited by Mark Zudeck Monday, September 15, 2014 11:52 PM
    Monday, September 15, 2014 11:45 PM
  • Hi,

    After testing with your code, I fail to reproduce your issue. I can get data from the workbook successfully when the *.xls file is closed.

    Since your connection string is correct and the issue is more related to OleDb connection instead of Excel Object Model, I suggest you posting in ADO.NET Managed Providers forum for more effective responses.


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Tuesday, September 16, 2014 7:14 AM
    Moderator
  • If you are able to convert the excel and load the csv instead, it'll work as a workaround and get the job done.

    In order to pursue this more I'd need a sample (of the excel) to try and reproduce.

    Which version of SSIS are you using ?

    Tuesday, September 16, 2014 7:44 AM
  • Hello, I tested with your code sample and could not reproduce the error.

    But, I was just wandering about the screen-shot from the DataSet Visualizer.

    The numbers in the column with the missing header has no "," in the values.

    It seems like the data for that column could have been treated differently?

    It looks almost like the values from that column has been read to a numeric variable.

    If there was an int variable in use somewhere it could explain why the cell data that contains alfa-characters disappear.

    But, I haven't seen that part of the code so I have no idea, just a hunch.

    Best regards,

    Thorsten

    Tuesday, September 16, 2014 8:34 AM
  • Thanks, Luna.  I just posted on the other forum.  Post is here.

    - Mark Z.

    Tuesday, September 16, 2014 2:35 PM
  • Thorsten, that's very interesting!  I didn't notice that the commas were missing, as well.  I can't see in my code that any particular columns are treated differently, but I will continue to examine.

    Thanks.


    - Mark Z.

    Tuesday, September 16, 2014 5:28 PM
  • f87 is treated like numbers but IMEX=1 should compensate for that but it's not working always !

    Found this post that should help:

    http://social.msdn.microsoft.com/Forums/sqlserver/en-US/78b87712-8ffe-4c72-914b-f1c031ba6c75/excel-source-on-ssis-2008-brings-null-values-imex1-not-working?forum=sqlintegrationservices

    Tuesday, September 16, 2014 6:52 PM
  • I can provide a sample workbook that shows the problem, if somebody can tell me how to do that on these forums.

    - Mark Z.

    Tuesday, September 16, 2014 8:51 PM
  • cnk_gr,

    I am using the version of SSIS in SQL Server Data Tools 11.1.40706.0.

    Per your request, I've made uploaded a sample workbook that illustrates the problem when my code accesses it.  You can find it at: 

    https://onedrive.live.com/redir?resid=BBC1C6B66C6F82F8%21114


    - Mark Z.

    Wednesday, September 17, 2014 3:55 PM
  • Using Excel Connection

    I can verify that metadata for F12 field on the sample is DT_R8 instead of DT_WSTR. Most values return as NULL...

    Once I opened excel along with SSIS it refreshed the metadata of F12 field to DT_WSTR.

    Using OLE DB Connection

    Had the safe side effects you described opening with excel at the same time lost ALL headers.

    IMO this is a problem that does come from using mixed types but the behavior is at least problematic.

    Wednesday, September 17, 2014 5:10 PM
  • Once i removed the BORDER between F11 and F12 cells (between column K and L) it worked fine.

    Smells like a bug... must be a bug...

    Wednesday, September 17, 2014 5:59 PM
  • Before and after...

    BEFORE


    Wednesday, September 17, 2014 6:09 PM
  • @cnk_gr - This bug turns out to be a "feature", and it should come with a big warning sign.

    I just came across this article, which explains that "ADO.NET scans the first 8 rows of data, and based on that, guesses the datatype for each column. Then it attempts to coerce all data from that column to that datatype, returning NULL whenever the coercion fails!"

    In other words, it is treating the worksheet as a relation table, in which all values in a given column are of the same type.  Of course, worksheet data is not bound by this restriction.

    This behavior can be gotten around by modifying these registry settings:

    Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/ImportMixedTypes

    Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/Typ

    I think this was a very risky design, inviting data transfer errors that could easily go unnoticed.


    - Mark Z.


    • Marked as answer by Mark Zudeck Wednesday, September 17, 2014 6:26 PM
    • Edited by Mark Zudeck Wednesday, September 17, 2014 6:33 PM
    Wednesday, September 17, 2014 6:25 PM
  • In this case I haven't changed the data only removed the border between two columns
    Wednesday, September 17, 2014 6:28 PM
  • Yes, it's very interesting that removing a border would change the result.  That seems very arbitrary.

    - Mark Z.

    Wednesday, September 17, 2014 6:32 PM