locked
Get OleDb reading from excel to skip all empty rows. RRS feed

  • Question

  • User-718146471 posted

    I have been wracking my brain trying to figure out how to get OleDB to not import empty rows in my importable excel files.  It starts and will run perfectly fine until it hits an empty row and then it craps out.  Is there some way of doing this where if the row it is encountering it goes to the next row?  I am using a foreach loop.  The loop code looks like this:

                    {
                        File.Copy(ExcelTempl, ExcelFile);
                        DirectoryInfo di = new DirectoryInfo(DropZone);
                        FileInfo[] fi = di.GetFiles("Master_*.xl*");
                        // step through each found file 
                        foreach (FileInfo fiTemp in fi)
                        {
                            {
                                // Connection String to Excel Workbooks
                                System.Threading.Thread.Sleep(1000);
                                string excelConnStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + DropZone + "\\" + fiTemp + "'; Extended Properties=\"Excel 8.0; HDR=YES; IMEX=1;\"";
                                WriteEvent("File Read", "Reading excel file " + fiTemp + " now.", "Automation User");
                                using (OleDbConnection connection = new OleDbConnection(excelConnStr))
                                {
                                    connection.Open();
                                    DataTable dt = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
                                    if (dt != null)
                                    {
                                        string[] excelSheets = new String[dt.Rows.Count];
                                        int i = 0;
                                        foreach (DataRow row in dt.Rows)
                                        { // I believe I need to do something right here

    Any suggestions would be appreciated!

    Thursday, January 15, 2015 10:44 AM

Answers

  • User188108250 posted

    I have not encountered this issue but that is b/c I am loading from the Excel sheet slightly different than you. I perform a SELECT * FROM {TABLE_NAME} which selects the data itself and ignores everything else. Hoep this helps you:

    // FORMATS CONNECTION STRING THAT WILL BE USED TO READ EXCEL FILE
                                String str_Connection = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + str_Excel_Filename +
                                    ";Extended Properties=\"Excel 12.0;HDR=YES;IMEX=1;\"";
    
                                // CREATED DB CONNECTION OBJECT TO READ FILE
                                using (System.Data.OleDb.OleDbConnection oledb_Connection = new System.Data.OleDb.OleDbConnection(str_Connection))
                                {
                                    // OPEN DB CONNECTION
                                    oledb_Connection.Open();
    
                                    // READS FILE DB SCHEMA
                                    System.Data.DataTable dt_Schema = oledb_Connection.GetSchema("COLUMNS");
    
                                    // CHECKS THAT DB SCHEMA IS NOT NULL
                                    if (dt_Schema != null)
                                    {
                                        // CREATES DB COMMAND TO READ FILE
                                        System.Data.OleDb.OleDbCommand odbc_Command = new System.Data.OleDb.OleDbCommand(String.Format("SELECT * FROM [{0}]",
                                            dt_Schema.Rows[0]["TABLE_NAME"].ToString()), oledb_Connection);
    
                                        // SETS DB COMMAND TYPE
                                        odbc_Command.CommandType = CommandType.Text;
    
                                        // PARSES FILE ACCORDINGLY
                                        new System.Data.OleDb.OleDbDataAdapter(odbc_Command).Fill(dt_Parsed);
                                    }
    
                                    // CLOSE DB CONNECTION
                                    oledb_Connection.Close();
                                }

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, March 4, 2015 8:43 AM
  • User269602965 posted

    Now would not the SQL query simply stop reading rows when it got to the first empty row?

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, March 5, 2015 4:21 PM
  • User753101303 posted

    I don't think so. It seems to me it would make more sense for the OleDb driver to process all rows rather than to just stop at the first empty row.

    My first move is that you must filter out yourself empty rows but for now it's a bit unclear what is the problem (I would be REALLY surprised if having colored cells would cause some issue as the OP seems to tell, I don't see why the OleDb driver would care about that).

    Would have to try to see how it behaves but for now I suspect that the OP have some kind of error and think it could be that.

    bbcompent1, if you have a problem and you are not absolutely 100% sure about its cause, it's best to just tell about the problem you have (error message or wrong behavior you see) than about what you think caused the problem.

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Friday, March 6, 2015 7:06 AM

All replies

  • User509211905 posted

    Hi there, try do something under your foreach ...

    foreach (DataRow row in dt.Rows)
      if (row.ItemArray != null)
       if (!row.ItemArray.All ( x=> x == null ||( x!=null && string.IsNullOrWhiteSpace(x.ToString()) )) )
         // process your rows that are not empty

    Hope that helps.

    Thursday, January 15, 2015 11:55 AM
  • User-718146471 posted

    It does not like the All in !row.ItemArray.All saying System.Array does not contain a definition for All.

    Thursday, January 15, 2015 12:44 PM
  • User509211905 posted

    add using System.Linq; 

    Thursday, January 15, 2015 1:49 PM
  • User-718146471 posted

    Ok, that got rid of the error.  I will be testing this out shortly.  For what it's worth, do you know of any way to have OleDB ignore any binary information?  For example color coding and filtering; any way to ingore that stuff when I import?

    Thursday, January 15, 2015 2:07 PM
  • User509211905 posted

    Hi there, I haven't try your scenario ignoring any binary information. This is interesting, can you give us a sample excel file that contains color coding and filtering? I'll like to try it out and find a way to ignore it. 

    Friday, January 16, 2015 2:10 AM
  • User-718146471 posted

    The forum here does not allow transferring of files, but what you can do is create a spreadsheet with some columns and headers in it.  Then go to the Data tab and click Filter.  Select a few rows and right click and click Format Cells. Go to Fill tab, select a background color.  Click OK, save your workbook. 

    Friday, January 16, 2015 12:53 PM
  • User188108250 posted

    I have not encountered this issue but that is b/c I am loading from the Excel sheet slightly different than you. I perform a SELECT * FROM {TABLE_NAME} which selects the data itself and ignores everything else. Hoep this helps you:

    // FORMATS CONNECTION STRING THAT WILL BE USED TO READ EXCEL FILE
                                String str_Connection = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + str_Excel_Filename +
                                    ";Extended Properties=\"Excel 12.0;HDR=YES;IMEX=1;\"";
    
                                // CREATED DB CONNECTION OBJECT TO READ FILE
                                using (System.Data.OleDb.OleDbConnection oledb_Connection = new System.Data.OleDb.OleDbConnection(str_Connection))
                                {
                                    // OPEN DB CONNECTION
                                    oledb_Connection.Open();
    
                                    // READS FILE DB SCHEMA
                                    System.Data.DataTable dt_Schema = oledb_Connection.GetSchema("COLUMNS");
    
                                    // CHECKS THAT DB SCHEMA IS NOT NULL
                                    if (dt_Schema != null)
                                    {
                                        // CREATES DB COMMAND TO READ FILE
                                        System.Data.OleDb.OleDbCommand odbc_Command = new System.Data.OleDb.OleDbCommand(String.Format("SELECT * FROM [{0}]",
                                            dt_Schema.Rows[0]["TABLE_NAME"].ToString()), oledb_Connection);
    
                                        // SETS DB COMMAND TYPE
                                        odbc_Command.CommandType = CommandType.Text;
    
                                        // PARSES FILE ACCORDINGLY
                                        new System.Data.OleDb.OleDbDataAdapter(odbc_Command).Fill(dt_Parsed);
                                    }
    
                                    // CLOSE DB CONNECTION
                                    oledb_Connection.Close();
                                }

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, March 4, 2015 8:43 AM
  • User753101303 posted

    bbcompent1

    color coding and filtering

    Do you mean colored cells ? OleDB should just ignore this. The purpose is to read cell values. Not sure about filtering but I would expect the driver to ignore this and just reading sheet data.

    If you still have a problem rather than "craps out" describe what happens exactly. For example hre with IMEX=1, it should read all columns as strings so I'm not sure what doesn't work on your side..

    Wednesday, March 4, 2015 9:01 AM
  • User269602965 posted

    Now would not the SQL query simply stop reading rows when it got to the first empty row?

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, March 5, 2015 4:21 PM
  • User753101303 posted

    I don't think so. It seems to me it would make more sense for the OleDb driver to process all rows rather than to just stop at the first empty row.

    My first move is that you must filter out yourself empty rows but for now it's a bit unclear what is the problem (I would be REALLY surprised if having colored cells would cause some issue as the OP seems to tell, I don't see why the OleDb driver would care about that).

    Would have to try to see how it behaves but for now I suspect that the OP have some kind of error and think it could be that.

    bbcompent1, if you have a problem and you are not absolutely 100% sure about its cause, it's best to just tell about the problem you have (error message or wrong behavior you see) than about what you think caused the problem.

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Friday, March 6, 2015 7:06 AM
  • User-718146471 posted

    Ok, let me describe the behavior with a little more detail.  OleDB provided by Microsoft Access Database Engine 2010 behaves in the following manner if it encounters the following scenarios

    1. When the driver encounters an empty row stating string data would be truncated
    2. When the driver encounters color coding it says binary data would be truncated
    3. When the driver encounters sorting it states binary data would be truncated

    I have discovered that the OleDB has a couple other problems documented in another posting which are as follows http://forums.asp.net/t/2036759.aspx?OleDB+duplicating+records

    1. Rows will be duplicated if you have more than 2000 rows
    2. Rows are duplicated with the newer XLSX format (perhaps it has something to do with the newer XML format?)

    Just to clarify, when I say duplicated, it imports the same rows twice.  For fun, here is my code:

                int reccnt = 0, InsCount = 0;
                if (!DirectoryIsEmpty(DropZone.ToString(), "MasterFile_*.xls"))
                {
                    try
                    {
                        DirectoryInfo di = new DirectoryInfo(DropZone);
                        FileInfo[] fi = di.GetFiles("MasterFile_*.xls");
                        // step through each found file 
                        foreach (FileInfo fiTemp in fi)
                        {
                            InsCount = 0;
                            {
                                // Connection String to Excel Workbook
                                System.Threading.Thread.Sleep(1000);
                                string excelConnStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + DropZone + "\\" + fiTemp + "'; Extended Properties=\"Excel 8.0; HDR=YES; IMEX=1;\"";
                                WriteEvent("File Read", "Reading excel file " + fiTemp + " now.", "Automation User");
                                using (OleDbConnection connection = new OleDbConnection(excelConnStr))
                                {
                                    connection.Open();
                                    DataTable dt = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
                                    if (dt != null)
                                    {
                                        string[] excelSheets = new String[dt.Rows.Count];
                                        int i = 0;
                                        foreach (DataRow row in dt.Rows)
                                        {
                                            excelSheets[i] = row["TABLE_NAME"].ToString();
                                            #region ExcelReading
                                            SheetName = excelSheets[i].ToString();
                                            OleDbCommand command = new OleDbCommand("SELECT * FROM [" + SheetName.ToString() + "]", connection);
                                            using (DbDataReader dr = command.ExecuteReader())
                                            {
                                                while (dr.Read())
                                                {
                                                    InsCount++;
                                                    #region Field Matrices
                                                    // Field Matrix
                                                    Case = dr.GetValue(0).ToString();
                                                    ClientID = dr.GetValue(1).ToString();
                                                    OriginalID = dr.GetValue(2).ToString();
                                                    PIC = dr.GetValue(3).ToString();
                                                    FirstName = dr.GetValue(4).ToString();
                                                    MInitial = dr.GetValue(5).ToString();
                                                    LastName = dr.GetValue(6).ToString();
                                                    TranDate = dr.GetValue(7).ToString();
                                                    TranType = dr.GetValue(8).ToString();
                                                    IP = dr.GetValue(9).ToString();
                                                    SessionID = dr.GetValue(10).ToString();
                                                    Username = dr.GetValue(11).ToString();
                                                    EmailAddress = dr.GetValue(12).ToString();
                                                    DOB = dr.GetValue(13).ToString();
                                                    #endregion
    
                                                    #region SQL Functions
                                                    reccnt = 0;
    
                                                    TranDate = TranDate.Replace("-", "");
                                                    SelDate = SelDate.Replace("-", "");
                                                    if (reccnt == 0)
                                                    {
                                                        // Insert
                                                        InsertSQL = "INSERT INTO Table ([Case],[ClientID],[OriginalID],[PIC],[First Name], [MInitial], [Last Name], [TranDate], [TranType], [IP], [SessionID], [User Name], [EmailAddress], [DOB])";
                                                        InsertSQL = InsertSQL + "values (@Case, @CID, @OID, @PIC, @FirstName, @MInitial, @LastName, @TranDate, @TranType, @IP, @SessionID, @Username, @EmailAddress, @DOB)";
                                                        SqlCommand InsertCmd = new SqlCommand(InsertSQL, conn2);
                                                        if (conn2.State == ConnectionState.Open)
                                                        {
                                                            conn2.Close();
                                                        }
                                                        conn2.Open();
                                                        InsertCmd.Parameters.AddWithValue("Case", CaseNo.Trim());
                                                        InsertCmd.Parameters.AddWithValue("CID", CID.Trim());
                                                        InsertCmd.Parameters.AddWithValue("OID", (object)OID.Trim() ?? DBNull.Value);
                                                        InsertCmd.Parameters.AddWithValue("PIC", PIC.Trim());
                                                        InsertCmd.Parameters.AddWithValue("FirstName", FirstName.Trim());
                                                        InsertCmd.Parameters.AddWithValue("MInitial", MInitial.Trim());
                                                        InsertCmd.Parameters.AddWithValue("LastName", LastName.Trim());
                                                        InsertCmd.Parameters.AddWithValue("TranDate", TranDate.Trim());
                                                        InsertCmd.Parameters.AddWithValue("TranType", TranType.Trim());
                                                        InsertCmd.Parameters.AddWithValue("IP", IP.Trim());
                                                        InsertCmd.Parameters.AddWithValue("SessionID", SessionID.Trim());
                                                        InsertCmd.Parameters.AddWithValue("Username", Username.Trim());
                                                        InsertCmd.Parameters.AddWithValue("EmailAddress", EmailAddress.Trim());
                                                        InsertCmd.Parameters.AddWithValue("DOB", DOB.Trim());
                                                        try
                                                        {
                                                            InsertCmd.ExecuteNonQuery();
                                                            conn2.Close();
                                                        }
                                                        catch (Exception ex)
                                                        {
                                                            #region Write to Audit Log
                                                            Event = "System Alert";
                                                            DoneBy = "System Services";
                                                            Information = "On demand import has encountered an error: " + ex.ToString() + "";
                                                            MyTimeStamp = DateTime.Now.ToString();
    
                                                            WriteEvent(Event, DoneBy, Information);
                                                            #endregion
                                                        }
                                                    }
    

    If you see anything here that would be better done differently, I am all ears.

    Tuesday, March 10, 2015 9:53 AM
  • User-718146471 posted

    I have not encountered this issue but that is b/c I am loading from the Excel sheet slightly different than you. I perform a SELECT * FROM {TABLE_NAME} which selects the data itself and ignores everything else.

    I plan to give this method you mention a try, worst case is it does not help.  Thanks for the advice, I'll be back after I recode and test it.

    Thursday, March 12, 2015 4:25 PM
  • User-718146471 posted

    As far as I can determine this code will do the job nicely.  Thanks Xenocdie83!

    Friday, March 13, 2015 7:46 AM
  • User-718146471 posted

    Hey, I am back with a minor modification to the code. Would this strip out any blank lines prior to using SQL Bulk Copy?

                                        new System.Data.OleDb.OleDbDataAdapter(odbc_Command).Fill(dt_Parsed);
                                        for (int i = 0; i < dt_Parsed.Rows.Count; i++)
                                        {
                                            object id = dt_Parsed.Rows[i];
                                            if (id != null && !String.IsNullOrEmpty(id.ToString().Trim()))
                                            {
                                                dt_Parsed.Rows[i][7] = dt_Parsed.Rows[i][7].ToString().Replace("-", "");
                                                dt_Parsed.Rows[i][36] = dt_Parsed.Rows[i][36].ToString().Replace("-", "");
                                            }
                                            else
                                            {
                                                dt_Parsed.Rows[i].Delete();
                                            }
                                        }
    

    In the case of my column 8 and 37, the numbers there cannot have a dash, these are integers.  The dt_Parsed.Rows[I].Delete() I theorize will delete that row from the data table if the id comes out as null or empty string.  Is my code correct?

    Monday, March 16, 2015 12:10 PM