คำตอบ SQLBulkCopy column validation not working

  • 15 มีนาคม 2555 19:58
     
     

    While testing an ASP.Net application used to upload an Excel file to SQL via SQLBulkCopy I received the error message "Input string was not in a correct format". One of the string input columns failed to convert to an INT column in the table. What makes this odd is that the same code and same DB and same input file works on a different server, with some restrictions.

    If I limit my Excel file to 9 records with the 9th being an invalid string, the error is thrown. But if I add one more record and make the invalid string the 10th (or beyond) record, the file is processed. The invalid column in question is converted to a NULL for some reason. However the other server still fails to process an invalid row no matter where it is found.

    To further pinpoint the source I converted the code to a console APP and ran it from my laptop. I saw the same strange behavior where 9 records threw an error but 10 records loaded to each DB server. So it would seem that my laptop also has some kind of configuration issue as the web servers. My laptop and one of the web servers will allow these invalid columns to be processed under the strange condition of 10 or more records. The DB servers appear to be irrelevant to the issue. So far, only one of our web servers acts as expected where an invalid string to INT value throws an error no matter where it is found.

    For reference we have no batch size set so this results in an all or nothing result. Also the connection string is the same and IMEX=1 is set at all times.

    Any ideas where I should to determine the difference in the web servers (and my laptop)? Could this be a driver? Or maybe a registry setting? Or maybe a missing patch?


    • แก้ไขโดย pretzelb 15 มีนาคม 2555 22:12 clarification
    •  

ตอบทั้งหมด

  • 16 มีนาคม 2555 2:17
    ผู้ดูแล
     
     

    Hi pretzelb,

    In case of the relevant Windows registry key values which differ from servers, please check out this KB article addressing this type of issue: System.FormatException occurs when attempting to convert a numeric string to a numeric data type.

    In your scenario, to prevent data inserted into SQL Server, you may have a try to set the corresponding column to not allow null value. If the issue persists on, please post related code block here for further troubleshooting.

    TechNet Subscriber Support
    If you are TechNet Subscription user and have any feedback on our support quality, please send your feedback here.


    Stephanie Lv

    TechNet Community Support


  • 16 มีนาคม 2555 12:34
     
     

    After researching this some more let me rephrase the question. To recap:

    • Same input Excel file (data in same order)
    • Same DB structure
    • Same ASP.Net code
    • The INT column in the DB does allow NULL values
    • Server 1 processes entire file with no errors and converts non-numeric values to NULL
    • Server 2 throws error when trying to convert non-numeric string to INT

    What configuration or setting do I investigate to make Server 2 behave like Server 1?

    There must be some difference between the two web servers that is causing this but so far everything I check shows them to be the same.

  • 16 มีนาคม 2555 17:56
    ผู้ดูแล
     
     

    Hello,

    Please, could you provide some more informations which could be useful ?

    - the full version (2000,2005,2008,2008 R2,2012 and the last installed service pack ) and edition for your both SQL Server' instances

    - idem for the operating system especially if your SQL Server instances are installed on different computers

    - the full version of the .Net Framework ( with service pack ) on both computers

    - the code you are using for your SqlBulkCopy

    - the definition of your Excel worksheet

    - the "faulty" row ( the 10th one if i am not doing an error )

    - the language used by your SQL Server instances and your Excel

    I think that all these informations could be helpful ( if not necessary ) to try to help you more efficiently.

    Have a nice day


    Mark Post as helpful if it provides any help.Otherwise,leave it as it is.

  • 16 มีนาคม 2555 21:41
     
      มีโค้ด

    I don't think this will help but here goes:

    • SQL Version - SQL Server 2008 SP2 EE
    • OS - Win Server 2003 R2 SE
    • Framework 4.0.30319
        Public Sub SimpleBulkCopyTest(ByVal dbname As String, ByVal servername As String)
            Dim excelConnectionString As String = ""
            Dim csb As New OleDb.OleDbConnectionStringBuilder
            csb.Provider = "Microsoft.ACE.OLEDB.12.0"
            csb.DataSource = "c:\excel10.xlsx"
            excelConnectionString = String.Concat(csb.ConnectionString, ";Extended Properties='Excel 12.0 Xml;HDR=YES;IMEX=1;ImportMixedTypes=Text;TypeGuessRows=0;'")
            Dim connection As New OleDbConnection(excelConnectionString)
            Try
                Using connection
                    Dim command As New OleDbCommand("Select ID FROM [Data$]", connection)
                    connection.Open()
                    Using dr As OleDb.OleDbDataReader = command.ExecuteReader()
                        Dim sqlConnectionString = "Database=" & dbname & ";Server=" & servername & ";User Id=" & userid & ";Password=" & password & ";Min Pool Size=1;Max Pool Size=200;"
                        Using bulkcopy As SqlBulkCopy = New SqlBulkCopy(sqlConnectionString)
                            bulkcopy.DestinationTableName = "ExcelData"
                            bulkcopy.WriteToServer(dr)
                        End Using
                        Console.WriteLine("File contents successfully uploaded to database" & vbCrLf)
                    End Using
                    connection.Close()
                End Using
            Catch ex As Exception
                Console.WriteLine("ERROR=" & ex.Message & vbCrLf)
            End Try
        End Sub

    For a sample Excel file just create a file using the corret name from the code above and make one column with a title of ID. Then add 8 numeric values followed by "ABC123" in the last row. For the DB table just use the correct table name from the code and make one column of type INT with NULL values allowed. The code and data is loosely based on this link here

    I'm suspecting a bug with TypeGuessRows in the registry. My laptop had a value of 8 which would explain why it would take 8 rows of data and guess INT then throw error on last row with alpha characters. But after setting that value to 0 it should not do that anymore and it still does, along with the web servers. Likewise I would expect that setting this value in my connection string would allow me to control the behavior but it does not appear to work either. Setting a value of 0, 5, 8, or 500 all results in an error on my laptop. It's almost as if I cannot control the number of records it should use to analyze the column data type.

  • 17 มีนาคม 2555 2:16
    ผู้ดูแล
     
     

    pretzlb,

    You may have a try to set values of TypeGuessRows and TypeGuessRows via registry, and remove them from the connection string. Please see: Help with a OleDB connection string for excel files. For more informationl: IMEX=1 revisit and TypeGuessRows setting change to 0 (watch for performance).


    Stephanie Lv

    TechNet Community Support

  • 17 มีนาคม 2555 10:18
    ผู้ดูแล
     
     

    Hello,

    If you try to insert ( With SQL Server Management Studio ) a row with a value ABC123 for a column defined as Int, you will have an error ( something like invalid value and it is logical ). Even if you set Null values allowed for this column, you will have an error. It is a problem on the Transact-SQL level : no solution except if you test the contain of the related value in your Excel worksheet. I suppose that it would be possible to define the corresponding column of your Excel worksheet as integer and so, the error will be trapped on the level of your Excel worksheet when this value is typed ( it is the 1st rule : always check if a value is with the correct format ).

    To modify values of TypeGuessRows in the registry seems a possible possible, but , according to me , it would only mask an enormeous error : the absence of controls on the validity of the typed data.

    As i am using scarcly Excel, i don't know whether Excel offers a control of validity on typed data. But the best way would be to read every row before to insert them in the SQL Server table and to reject any invalid data.

    I have begun to use databases since 1982 ( as DBA and developer ) and i have always tested the invalid format of data when they were created : the best way to avoid any problem with an angry user ( and to avoid to have problems during the load of a table when all the users have come back home at 10 PM ).

    I am sorry if you think that i am rude but the validation of the correct format of data is one of the 1st rules that i learnt and used even if 30 years ago, it was not usual : when you are alone to behave as system manager, DBA and developer, you take very quickly "good" habits to avoid the most "classical" problems of the kind you have.

    A possible solution would be to export a .csv file ( instead to use a .xls one ) with a file containing the definition of columns. You read each record of your .csv file, test the validity of the data with the TryParse of the System.In32 class. TryParse retuns a false value if the data is not in the good format. In this, you can display an information message about the problem and you skip the record. No bug , the record is not treated  and you can know why you have a problem. An heavy way but you are sure that at least all the correct records are inserted in your database ( it is the way i am using , but maybe it will not be accepted by everybody )

    About TryParse method :

    http://msdn.microsoft.com/en-us/library/f02979c7(v=vs.90).aspx 

    TryParse exists for nearly every type ( numeric,DateTime,....)

    Don't hesitate to post again for more help or explanations

    Have a nice day  


    Mark Post as helpful if it provides any help.Otherwise,leave it as it is.

  • 17 มีนาคม 2555 20:49
     
     

    pretzlb,

    You may have a try to set values of TypeGuessRows and TypeGuessRows via registry, and remove them from the connection string. Please see: Help with a OleDB connection string for excel files. For more informationl: IMEX=1 revisit and TypeGuessRows setting change to 0 (watch for performance).


    Stephanie Lv

    TechNet Community Support

    If I change the values in the registry (and remove them from the connection string) do I need a reboot each time before the are applied? I want to be sure of this point before I make a final statement on the result.

    Also, if that link you posted is correct, wouldn't that mean there is a bug or error in the documentation for the connection string use of those parameters? I don't recall reading anything that said the parms would be ignored if a value was set in the registry.

    I will give what you say a try but I believe I do have TypeGuessRows set to 0 in the registry for all the servers and my laptop. If the value on my connection string is really ignored then they should all be using a value of 0 and they should all behave the same.

  • 17 มีนาคม 2555 20:58
     
     

    Papy

    I agree with your points but I've found that importing records from Excel spreadsheets is something that no business will ever do without. I often wish I could ban Excel because of all the headaches it causes in my apps.

    I also agree with you on the validation and that is part of my confusion. I did read somewhere that SQLBulkCopy will insert NULL for invalid values if the DB allows it but I don't have the reference to that article right now. While this would seem like a bad idea from an integrity standpoint I am OK with it as long as I know that I can control it. Our DB does allow nulls in that column after all. My problem is that one server is willing to convert the non-numeric values to NULL while the other is not. Even though I do appear to be telling them both to do exactly that.

    In an ideal world I would analyze each record prior to loading it but we are using SQLBulkCopy to allow the fast upload of a large amount of data via a web application. I am not sure that we could accomplish this task if we were to analyze each column of each row. We are loading this data to a temp or staging table so we can perform further validation on it. It is necessary so we can inform the user which row and column has an error if we do find one. As far as I know there isn't a fast method (that won't cause a session timeout) to load large amounts of data that can provide a user with a row/column detailed error report at the same time.

  • 19 มีนาคม 2555 14:53
     
      มีโค้ด

    I have some very confusing test results so far. I am not entirely sure on the registry settings and the Microsoft.ACE.OLEDB.12.0 driver. I did update all references to TypeGuessRows to 8 but all I found was these keys:

    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel
    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel

    To focus on the Jet engine I used this slightly modified code and a 2007 XLS file:

        Public Sub SimpleBulkCopyTestJet(ByVal dbname As String, ByVal servername As String)
            Dim excelConnectionString As String = ""
            Dim csb As New OleDb.OleDbConnectionStringBuilder
            csb.Provider = "Microsoft.Jet.OLEDB.4.0"
            csb.DataSource = "c:\excel10_2003.xls"
            excelConnectionString = String.Concat(csb.ConnectionString, ";Extended Properties='Excel 8.0;HDR=YES;IMEX=1'")
            Dim connection As New OleDbConnection(excelConnectionString)
            Try
                Using connection
                    Dim command As New OleDbCommand("Select ID FROM [Data$]", connection)
                    connection.Open()
                    Using dr As OleDb.OleDbDataReader = command.ExecuteReader()
                        Dim sqlConnectionString = "Database=" & dbname & ";Server=" & servername & ";User Id=" & userid & ";Password=" & password & ";Min Pool Size=1;Max Pool Size=200;"
                        Using bulkcopy As SqlBulkCopy = New SqlBulkCopy(sqlConnectionString)
                            bulkcopy.DestinationTableName = "ExcelData"
                            bulkcopy.WriteToServer(dr)
                        End Using
                        Console.WriteLine("File contents successfully uploaded to database" & vbCrLf)
                    End Using
                    connection.Close()
                End Using
            Catch ex As Exception
                Console.WriteLine("ERROR=" & ex.Message & vbCrLf)
            End Try
        End Sub

    With IMEX=1 the values from the registry should be used. With TypeGuessRows=8 it should read the first 8 rows to determine the data type. What I see is this (first row is header):

    • First 8 rows of numbers with next row of text -> works fine (text is imported as NULL)
    • Less than 8 rows of numbers followed by row of text -> Error: The given value of type String from the data source cannot be converted to type int of the specified target column
    • Insert row of text as first row followed by numbers -> Error (same as above)
    • Remove IMEX=1 and all previous combinations -> works fine (text is imported as NULL

    Perhaps I'm missing something but these results seem the opposite fo what I would expect. If the first row is text followed by numeric values shouldn't it assume a mixed mode? Plus, if the first 8 rows are numeric and then text follows shouldn't it assume numeric and error off the text values? Plus, if you remove IMEX=1 then what values are being used?

    I also can't find enough detail on the IMEX=1 setting. I do see that it means "import mode" and that it also means the registry values should be read but nothing more. What exactly does "import mode" entail? What are the benefits of using it? Does it only mean that now the registry values will be used? What does leaving out the IMEX parm mean? The most definition I found was a reference here and another here but they don't answer the questions I just posted. Plus the results I'm seeing are nearly the opposite of what should be the expected behavior.

    I should mention, I did test updating the registry value for TypeGuessRows to 4 just to be sure I had the right registry key. The results were the same if I adjusted my input data in Excel accordingly (ie 4 rows of numbers followed by a row of alphanumeric). I am sure that I found the correct registry key that is being used. But clearly it is not being used like I expected.

    I have yet to look at the web servers yet since all my testing thus far was on my local machine and until I can get results that make sense locally I'm not sure it's worth the effort to look at the other machines. I'm hoping someone can shed some light on the topic. I'm starting to think I'm missing a really basic point here.


    • แก้ไขโดย pretzelb 19 มีนาคม 2555 22:23 added note on registry test of 4 vs 8
    • ทำเครื่องหมายเป็นคำตอบโดย pretzelb 20 มีนาคม 2555 14:28
    • ยกเลิกการทำเครื่องหมายเป็นคำตอบโดย pretzelb 20 มีนาคม 2555 14:28
    •  
  • 20 มีนาคม 2555 14:28
     
     คำตอบ

    It appears as if the TypeGuessRows value is used to make the guess then a check is made against the DB. If the guess is "text" and the DB column INT then the error is thrown. If the guess is "numeric" and the column is INT then no error is thrown and any text values following are translated to NULL. All this happens before any writes are done.

    I confirmed this by creating another test table with a nvarchar column to see what would happen. No errors were ever thrown since text values are valid in an nvarchar column. But if the guess was numeric and text values followed, a NULL was inserted for the text even though the hosting column was nvarchar and it was a valid value for that column.

    What this should mean in the case of an INT column is the result of the guess. If the guess is anything but numeric then the error is thrown. It also appears that TypeGuessRows on the connection string is ignored with the IMEX=1 setting. All the TypeGuessRows values in the registry must be checked and validated to get the expected results. The key is finding the correct TypeGuessRows for the connection string / driver in use.

    • ทำเครื่องหมายเป็นคำตอบโดย pretzelb 20 มีนาคม 2555 14:28
    •