none
OleDbConnection only finds cell value when workbook is also open in Excel RRS feed

  • Question

  • I have a program (actually SSIS script task, but I don't suppose that matters) that creates an OLE DB connection to an Excel workbook, and reads the cell values in each worksheet, storing them in a SQL Server table.

    Each worksheet has several sections of rows, each section being for a separate product. The first two rows of each product section are a quarter row, and a year row. So, something like this:

    Worksheet year and quarter rows

    I use an OleDbDataReader with a "Select *" command to read the data in each sheet into a DataTable. I have a column called "YearQuarter" in my SQL database, where I store a concatenation of the year row value and the preceding quarter row value, with a hyphen between the two strings:

    YearQuarter column in SQL table

    Looking at the output data, I noticed that I was getting inconsistent results. Some rows would have a YearQuarter column value that would have only the Year row value in them, while others would have the cell values from both rows. For example, I'd have "2009 - Year End" followed by just "2010", with no " - 1st Qtr." appended to it.  (Notice that, in this case, the column separator is darker, indicating an Excel "frozen column". I think that this is just a coincidence, however.)

    In my program, this would mean that the Quarter cell value for that column appeared to the data reader to be empty.

    However, I noticed that it wasn't consistent. Sometimes I'd run my package and the same row would now have the full value. So, in the above example, I'd get "2010 - 1st Qtr."

    I finally realized that it was working as expected only if I happened to have the workbook open in Excel at the same time that the program was running!

    Why would this make a difference? Could it be that there is a macro or something in the workbook that is executed by Excel, but not when the workbook is accessed only via an OLE DB connection? Would the fact that it had been executed in Excel then affect the data obtained by OLE DB? If that's the case, how do I get around this? The spreadsheets are provided to me, and I have to process them as they are.

    NOTE: If I save the spreadsheet as a .csv, all the values in the quarters row are preserved.

    - Mark Z.

    Tuesday, September 16, 2014 2:15 PM

Answers

  • This turns out to be a "feature", and it should come with a big warning label.

    I just came across this article, which explains that "ADO.NET scans the first 8 rows of data, and based on that guesses the data type for each column. Then it attempts to coerce all data from that column to that datatype, returning NULL whenever the coercion fails!"

    In other words, it is treating the worksheet as a relation table, in which all values in a given column are of the same type.  Of course, worksheet data is not bound by this restriction.

    This behavior can be gotten around by modifying these registry settings:

    Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/ImportMixedTypes

    Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/Typ

    I think this was a very risky design, inviting data transfer errors that could easily go unnoticed.


    - Mark Z.


    • Marked as answer by Mark Zudeck Wednesday, September 17, 2014 6:30 PM
    • Edited by Mark Zudeck Thursday, September 18, 2014 6:11 PM
    Wednesday, September 17, 2014 6:30 PM

All replies

  • The file is of type .XLS.  I have tried changing extended property "Excel 12.0" to "Excel 8.0".  (Recommended for .xls files, I believe.)  I also tried saving the file as .XLSX and using extended property "Excel 12.0".  These changes did not solve the problem.

    My code is like this:

              OleDbConnection oleExcelConnection = new OleDbConnection(
                    "Provider=Microsoft.ACE.OLEDB.12.0;" +
                    "Data Source=" + strWkbkFilePath + ";" +
                    "Mode=Read;" +
                    "Extended Properties=\"Excel 8.0;HDR=No;IMEX=1\"");

               oleExcelConnection.Open();

               DataTable dtCurrSheet = new DataTable();

               // Name of table is in strLoadTblNm.

                OleDbCommand oleExcelCommand;
                OleDbDataReader oleExcelReader;

                oleExcelCommand = excel_conn.CreateCommand();
                oleExcelCommand.CommandText = "Select * From [" + strLoadTblNm + "]";
                oleExcelCommand.CommandType = CommandType.Text;
                oleExcelReader = oleExcelCommand.ExecuteReader();

                // Load worksheet into data table
                dtSheet.Load(oleExcelReader);

                oleExcelReader.Close();

    For most cells, the data is properly extracted into the DataTable.  However, there are some that are not obtained.  For instance, I have a worksheet like this:

    Input worksheet data


    Notice that the first column with a year value of "2010" has "1st Qtr." in the cell above it.  Here is a screen shot of what I see in the DataSet Visualizer in Visual Studio:

    DataSet Visualizer

    That "1st Qtr." value has not been loaded into the DataTable, though the quarter values in the columns before and after it were loaded.   Note also that, in that column, the numeric cells lose their formatting (no commas).


    - Mark Z.


    • Edited by Mark Zudeck Tuesday, September 16, 2014 6:39 PM
    Tuesday, September 16, 2014 2:30 PM
  • Another thing not tried:

    Add the extended property ReadOnly  = false in the ConnectionString without mode=read.

    Tuesday, September 16, 2014 2:53 PM
  • It could be the "mixed mode" data issue when identifying the data type of a column. To test this try changing the TypeGuessRows value for the OLEDB Provider you are using to 0. The below article is somewhat old, but it may be applicable:

    http://support.microsoft.com/kb/189897/en-us


    Paul ~~~~ Microsoft MVP (Visual Basic)

    Tuesday, September 16, 2014 3:34 PM
  • @Paul P Clement IV - Just tried running with "TypeGuessRows=0".  Same result.

    - Mark Z.

    Tuesday, September 16, 2014 6:31 PM
  • @cnk_gr - Just tried running with "ReadOnly  = false".  Same result.

    - Mark Z.

    Tuesday, September 16, 2014 6:38 PM
  • Without reproducing can't get any other idea.

    If you could share a sample (just 2-3 lines of the excel) that have the problem would help for trial and error.


    Tuesday, September 16, 2014 6:45 PM
  • Where you able to verify whether the missing column header value is available in the DataReader, or does it get lost during the transfer to the DataTable?

    Paul ~~~~ Microsoft MVP (Visual Basic)

    Tuesday, September 16, 2014 7:41 PM
  • I can provide a sample workbook that shows the problem, if somebody can tell me how to do that on these forums.

    - Mark Z.

    Tuesday, September 16, 2014 8:47 PM
  • Hello Mark,

    You could provide the sample workbook and your project(if you can) to onedrive:

    https://onedrive.live.com/

    And share the generated link with us, you could check this link for detail steps:

    http://windows.microsoft.com/en-hk/onedrive/share-file-folder

    Regards. 


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Wednesday, September 17, 2014 2:30 AM
    Moderator
  • Okay Fred,

    I have uploaded the workbook, and it can be found here:

    https://onedrive.live.com/redir?resid=BBC1C6B66C6F82F8%21114

    Thanks.


    - Mark Z.

    Wednesday, September 17, 2014 3:39 PM
  • cnk_gr,

    If you are able to reproduce the bug, you should see the that the "1st Qtr" value in column L of row 11 will be missing in the dataset.  Also, the numeric values in that column lose their formatting (no comma).


    - Mark Z.

    Wednesday, September 17, 2014 3:44 PM
  • Paul, I'm not sure how to locate particular values in the DataReader while debugging.

    I have uploaded a sample workbook to onedrive, and provided the URL.  (See my comment in this thread.)


    - Mark Z.

    Wednesday, September 17, 2014 3:48 PM
  • This turns out to be a "feature", and it should come with a big warning label.

    I just came across this article, which explains that "ADO.NET scans the first 8 rows of data, and based on that guesses the data type for each column. Then it attempts to coerce all data from that column to that datatype, returning NULL whenever the coercion fails!"

    In other words, it is treating the worksheet as a relation table, in which all values in a given column are of the same type.  Of course, worksheet data is not bound by this restriction.

    This behavior can be gotten around by modifying these registry settings:

    Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/ImportMixedTypes

    Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/Typ

    I think this was a very risky design, inviting data transfer errors that could easily go unnoticed.


    - Mark Z.


    • Marked as answer by Mark Zudeck Wednesday, September 17, 2014 6:30 PM
    • Edited by Mark Zudeck Thursday, September 18, 2014 6:11 PM
    Wednesday, September 17, 2014 6:30 PM
  • Yes this is true and it's related to my post regarding TypeGuessRows. If you have a column that is predominantly numeric then the data type will be determined to be numeric and string values will become null. The IMEX connection string argument is supposed to cause all mixed-mode data to be converted to text, but it seems to be inconsistent in behavior.

    The problem with Excel is that it does not implement standard database practices and the ODBC driver and OLEDB Provider/Excel ISAM are not able to identify column data types without "guessing". It's treated similarly to a flat or character delimited file, where the database schema information is not defined within the data store.


    Paul ~~~~ Microsoft MVP (Visual Basic)


    Wednesday, September 17, 2014 7:19 PM
  • Paul, the reason that the IMEX connection option is inconsistent is that the option just tells ADO.NET to honor the ImportMixedTypes registry key.  So, if that key is not set as desired in the registry, the IMEX option won't work as you want it to.


    - Mark Z.


    • Edited by Mark Zudeck Thursday, September 18, 2014 6:17 PM
    Thursday, September 18, 2014 6:17 PM