locked
Sporadic transport-level connection errors from SQL Azure RRS feed

  • Question

  • I have a page in web role that runs a stored procedure on SQL Azure that takes a fraction of a second to run the stored procedure and load the page. Sporadically, when that page is called, I get one of two errors:

    1.
    A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)

    2.
    A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 - An established connection was aborted by the software in your host machine.)

    I've used "using" clauses for all my SqlClient connections and the stored procedure never takes longer than a fraction of a second to load. Here's my code:

            Int16 retries = 0;
    
    
            TryAgain:
    
            try
            {
                using (SqlConnection conn = new SqlConnection(connectionString))
                {
                    conn.Open();
    
                    SqlCommand cmd = new SqlCommand();
    
                    cmd.Connection = conn;
                    cmd.CommandText = "SPROC Name";
                    cmd.CommandType = CommandType.StoredProcedure;
    
    		//Code for SPROC
    
                    Int32 rowsAffected = cmd.ExecuteNonQuery();
    
                    //Added conn.Close() to extraspecially make sure connection is closed!
                    conn.Close();
                }
            }
            catch (Exception ex)
            {
                CommonFunctions.LogActivity("SQL Error (Retry): " + ex.Message);
                if (retries < 1)
                {
                    retries++;
                    //Added short sleep
                    System.Threading.Thread.Sleep(50);
                    goto TryAgain;
                }
                //Failed!
                CommonFunctions.LogActivity("SQL Error (Failed): " + ex.Message);
            }


    I have a site monitoring company testing the url every 5-20 minutes that confirms this and reports the error anywhere between every 5 to 40 minutes.

    I also created a windows azure app to monitor the web role twice a second to see the effect and the errors are still occuring with roughly the same frequency.

    I have a workaround which is to sleep for 50 milliseconds and rerun the stored procedure again. The second attempt always works and prevents the error, but this is not ideal as speed is critical to the app and I'm concerned why this is happening.

    Any ideas how I can prevent these errors?

    Cheers

    David

    • Edited by DavidM51 Wednesday, December 2, 2009 11:14 PM Added code snippet
    Tuesday, December 1, 2009 10:52 PM

Answers

  • The problems you are experiencing are caused by how the ADO.net connection pooling works and us killing idle connections.
    Basically, you get the error because you get a connection from the pool that is invalid (int this case because it was killed by the gateway).
    I believe the ADO.NET team is going to fix this on the client side and making pooling more robust. For the time being we will significantly increase the time before killing idle connections (maybe in the next QFE), so the problem should be less frequent.

    We advise the custommers to have the retry logic built into their applications, which is exactly what you are doing. You don't need the sleep(50) before retrying, next time you try you should get a good connection.

    Thanks,
    Justyna

    • Proposed as answer by justynaw Friday, December 4, 2009 12:03 AM
    • Marked as answer by DavidM51 Saturday, December 5, 2009 6:40 PM
    Thursday, December 3, 2009 11:18 PM

All replies

  • Can you provide your database and server name plus the time when this happened recently?
    Thanks,
    Justyna
    Wednesday, December 2, 2009 11:20 PM

  • Thanks Justyna!


    Database Name: ChiLite
    Server Name: jlb020vjem

    Log extract of last 20 errors (DateTime is UTC):

    2009-12-02 23:09:00.727 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 23:04:00.043 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 22:49:01.620 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 22:49:01.027 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 22:34:00.767 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 21:34:00.343 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 21:19:01.420 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 21:14:00.377 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 21:04:01.580 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 20:39:00.743 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 20:29:00.117 SQL Error (retry): A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 - An established connection was aborted by the software in your host machine.)
    2009-12-02 20:19:00.950 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 20:04:01.050 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 19:58:59.860 SQL Error (retry): A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 - An established connection was aborted by the software in your host machine.)
    2009-12-02 19:09:00.517 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 18:44:00.830 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 18:19:00.293 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 17:49:00.647 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 17:39:00.750 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    2009-12-02 17:14:00.953 SQL Error (retry): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    Wednesday, December 2, 2009 11:30 PM
  • A few more questions:

    I assume this is one client application that gets all this errors and one thread, is this true? Are you using client connection pooling? (default is yes).
    During the time between the errors ex. 23:04:00 -  23:09:00.727 did you get any successful results?
    I can see about several active connecitons to our gateway during the time the error ocurred and would like to know if this is something that your application was doing on purpose.

    That would help a lot!

    Thanks,
    Justyna
    Thursday, December 3, 2009 10:51 PM
  • The problems you are experiencing are caused by how the ADO.net connection pooling works and us killing idle connections.
    Basically, you get the error because you get a connection from the pool that is invalid (int this case because it was killed by the gateway).
    I believe the ADO.NET team is going to fix this on the client side and making pooling more robust. For the time being we will significantly increase the time before killing idle connections (maybe in the next QFE), so the problem should be less frequent.

    We advise the custommers to have the retry logic built into their applications, which is exactly what you are doing. You don't need the sleep(50) before retrying, next time you try you should get a good connection.

    Thanks,
    Justyna

    • Proposed as answer by justynaw Friday, December 4, 2009 12:03 AM
    • Marked as answer by DavidM51 Saturday, December 5, 2009 6:40 PM
    Thursday, December 3, 2009 11:18 PM
  • Justyna,

    I've been trying to figure out a workaround to these transport level errors for days now.
    I was convinced this was an issue with connection pooling but couldn't prove it - it's nice to have the confirmation.

    But until the next QFE, in this thread I asked this question:

      Any way to increase SQL Azure's connection timeout?
      http://social.msdn.microsoft.com/Forums/en-US/ssdsgetstarted/thread/849bad83-0656-4a8e-b98c-6917c4a5d52f

    Is there anyway to change the timeout today?
    Otherwise we keep getting this error which results in end users seeing an error on the web site.

    Saturday, December 5, 2009 5:38 AM
  • Hi Emmanuel

    You can use the retry code in my original post that works 100% of the time for me (as Justyna mentions, the sleep is not required).

    I have performed approximately 700,000 tests on the code in my original post in the last four days (testing around once every half a second) and had to retry 168 times and had 7 "Service Unavailable" http errors in that time. So far the code has never failed on the second attempt to get another connection from the pool.

    Hope that helps!

    Cheers

    David
    Saturday, December 5, 2009 7:05 PM
  • Thanks David -

    Unfortunately for this specific project I was not directly writing ADO.NET code, but I was using .netTiers, my favorite ORM.
    It took us a while but we now found the solution.

    For those using .netTiers (http://nettiers.com/) I found a way to avoid these exceptions when connecting to SQL Azure.
    I blogged about it here:

      How to stop getting exceptions when connecting to SQL Azure: “System.Data.SqlClient.SqlException:
      An established connection was aborted by the software in your host machine”
      http://blog.ehuna.org/2010/01/how_to_stop_getting_exceptions.html

    It's basically fixing the retry feature of .netTiers changing one line of code in one of the .netTiers templates.

    Tuesday, January 26, 2010 1:26 AM
  • The retry logic does not work for us: since we added it, every 6 to 8 hours our Windows Azure instances are freezing and needs to be restartet.

    We use the following code:
            internal SqlDataReader ExecuteReader(SqlCommand commandToExecute)
            {
            TryAgain:
    
                int retries = 0;
                SqlDataReader sqlDataReader = null;
                try
                {
                    sqlDataReader = commandToExecute.ExecuteReader();
                }
                catch (Exception exception)
                {
                    if (sqlDataReader != null)
                    {
                        sqlDataReader.Close();
                    }
                    if (retries == 0)
                    {
                        retries++;
                        goto TryAgain;
                    }
                    throw exception;
                }
                return sqlDataReader;
            }
    
            internal int ExecuteNonQuery(SqlCommand commandToExecute)
            {
            TryAgain:
    
                int retries = 0;
                int returnValue;
                try
                {
                    returnValue = commandToExecute.ExecuteNonQuery();
                }
                catch (Exception exception)
                {
                    if (retries == 0)
                    {
                        retries++;
                        goto TryAgain;
                    }
                    throw exception;
                }
                return returnValue;
            }

    Our connection string looks like this:
    Data Source=***********.database.windows.net; Database=*********; User Id=********; Password=*****************; Trusted_Connection=False

    Any ideas what could go wrong?
    Are there other known solutions to this problem?
    Tuesday, January 26, 2010 10:38 AM
  • Hi Sebastian

    I've just had a quick look and it appears that you have defined the retries variable inside the try again loop. So you should change:

            {
            TryAgain:
                int retries = 0;
                int returnValue;
                try
                {


    to:

            {
                int retries = 0;
                int returnValue;
          TryAgain:
                try
                {


    This should stop the eternal loop that you are getting. Theoretically you shouldn't hit this eternal loop with the retry code anyway, so there could be another problem, but at least this fixes the first...

    Hope that helps!

    David
    Tuesday, January 26, 2010 2:15 PM
  • Thanks, David. I'll give it a shot.
    Wednesday, January 27, 2010 6:37 PM
  • Resolving the eternal loop revealed another error within the Execute Reader method (the ExecuteNonQuery seems to work). I get the following exception now:

    System.InvalidOperationException: ExecuteReader requires an open and available Connection. The connection's current state is closed.


    I call the ExecuteReader method for example with the following code:
    string resultValue = "";
    using (SqlConnection connection = new SqlConnection(ConnectionString))
       {
          SqlCommand myCommand = new SqlCommand("sp_SelectSomething", connection);
          myCommand.CommandType = CommandType.StoredProcedure;
    
          SqlParameter myParameter = new SqlParameter("param1", SqlDbType.BigInt);
          myParameter.Value = "test";
          myCommand.Parameters.Add(myParameter);
          connection.Open();
          SqlDataReader resultReader = ExecuteReader(myCommand);
          if (resultReader.Read())
          {
             resultValue = resultReader.GetString(0);
          }
          resultReader.Close();
       }
    return resultValue;
    Could it be, that
    sqlDataReader.Close();
    also closes the connection or that the command can be executed only once on the same connection?
    Is there any other possibility to establish a retry-logic for ExecuteReader() within a seperate method?

    Thanks for your help, Sebastian

    Wednesday, January 27, 2010 7:56 PM
  • Hi Sebastian

    Is the error infrequent and sporadic? Is the using clause inside the try block?

    If you answered yes to those questions, then the retry code should kick in and fix it. If not, then try putting the using clause inside the try block.

    Closing the reader does not close the connection.

    Cheers

    David

    Thursday, January 28, 2010 12:36 PM
  • Hi David,

    yes, the error is sporadic and no, the using clause is not inside the try block. I'd like to prevent moving the using clause inside the try block, since our code relies heavily on ADO.NET transactions which makes it nearly impossible to move the using clause (and therefore the creation of the connection) into a central method which should be only responsible for executing the commands and the retry logic.

    Microsoft Support is currently taking a look at the problem. I will wait for their feedback first.

    Best regards,
    Sebastian
    Friday, January 29, 2010 2:37 PM
  • Hi Sebastian,

    I would suggest you to either increase the time-out for the SQL Azure or decrease it for the ADO .net.

    The timeout we have set is based on field data

    for them to alter and increase the value, would only extend the timeout value of the connection itself

    as to the methodology they use, be it sqlcmd, or ado it's up to them.

    however please be informed that

    if they have a connexn that sits idle for too long, it'll automatically be killed

    We need to do that to preserve resources and for any potential ddos attacks

    Note : it is possible to increase or decreae the timeout .

    the programming model is flexible enough to handle that

     

    Saturday, January 30, 2010 10:52 AM
  • Hi Lancers,

    I've already been searching for hints how to increase the time-out but was not able to find something. Do you have an idea, how to achieve that?

    By the way, I got response from Microsoft Support: the issue should be fixed by February 16th. But they did not mention how they will fix it.

    Best regards,
    Sebastian
    Friday, February 5, 2010 9:09 AM
  • Microsoft increased the connection timeout of SQL Azure to 30 minutes. This fix solved the issue for us.
    Friday, February 26, 2010 9:15 AM
  • I just saw this issue.  Funny how these things happen right when you go to demo.

    Sheesh.

     

    tim

    Thursday, May 13, 2010 11:05 PM
  • I get the same error all the time on my home satellite connection with a 1,500ms latency but zero times on my 10 megabit work connection - I've just chalked it up to my ____ connection - just hope any of my site visitors don't have a ____ connection.

    Thursday, May 13, 2010 11:35 PM
  • sql.azure.com problem:

    An unexpected error has occurred. Please go back and retry your operation.

    Please use activity id 'b65541ba-7efa-4ba9-a89f-0312378bf15f' and UTC timestamp '5/13/2010 11:34:05 PM' when contacting customer support.

    ---

    sigh

    Thursday, May 13, 2010 11:36 PM
  • Same prob - see my post below this one "SQL database down...". Glad it's not just me;)
    Thursday, May 13, 2010 11:37 PM
  • So much for the demo.  In case you ever see this error, check the Dashboard.  Of course, the question being asked is, if the database is super-redundant, spread across the cloud and all that, should I be able to access what I need within seconds of the downage?

    http://www.microsoft.com/windowsazure/support/status/servicedashboard.aspx

    2010-05-13T23:24:24.183May 13 2010 11:23PM We are currently addressing an issue impacting connectivity for SQL Azure Database. Next update within an hour.

    Update:

    May 14 2010 12:06AM Connectivity issues are now resolved. Emergency maintenance was occurring on our infrastructure, which resulted in intermittent connectivity failures for some customers.

    • Edited by tofutim Friday, May 14, 2010 2:16 AM update
    Thursday, May 13, 2010 11:45 PM
  • This issue has not been fixed by Microsoft. Even though I made the recommended changes I still get the same error:

    A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 - An established connection was aborted by the software in your host machine.

    I can't believe they still haven't fix this problem. This is very annoying.

    Thursday, July 29, 2010 5:19 PM
  • We are having this issue also using the latest EF Code First stuff against SQL Azure.  Is there a work around?
    Thursday, September 8, 2011 3:12 PM
  • I also have this issue. It's sporadic and I think is about ADO.NET connection pooling: I get the issue both with and without Entity Framework.
    Wednesday, December 28, 2011 3:32 PM
  • I've got exactly the same problem but not via Azure. It's ruining my working day
    Friday, July 12, 2013 10:28 AM
  • I was using Azure SQL server 2012. I faced the same issue, but what really happened in my case is one of the fields in the table I am inserting into has a character limit varchar(50), some of my rows crossed that limit and I got "A transport-level error has occurred when receiving results from the server. (provider: Session Provider,error: 19 -Physical connection is not usable)".

    This may be because the connection was closed by server because of the exception. This error description is the least helpful one.

    What I suggest is to in a loop add single row to the table and see if it goes through. If it failed on a specific row, compare the row with other rows and check the data of the row for any anomalies.

    My code was.

            using (SqlConnection con = new SqlConnection(constring))
            {
                con.Open();
                using (SqlBulkCopy bulkCopy = new SqlBulkCopy(con))
                {
                    bulkCopy.DestinationTableName = "dbo.Table";

                    bulkCopy.BatchSize = 50;

                    try
                    {
                         bulkCopy.WriteToServer(clips);
                    }
                    catch (Exception ex)
                    {

                        Console.WriteLine(ex.Message);
                    }
                }
            }
    Tuesday, August 20, 2013 11:51 AM