locked
SQLSTATE=08501 - OR - Communication Link Failure RRS feed

  • Question

  •  I was just trying to solve an issue that keeps coming up
     periodically.

     We are using PowerBuilder v9.0 as our front end and MS SQL
     Server as our back end. We are using ODBC as opposed to
     using the Native SQL Server Database driver that comes with
     PowerBuilder and we want to keep it that way.

     We have a scheduled process that runs hourly throughout the day,
     everyday, and about every two weeks we start to get strange
     errors coming from this process.

     This process connects to the database, does its work then 
     disconnects and then does not connect again until the next time it 
     is scheduled to run. It can, at times, process 10s of thousands of 
     rows. And this goes on every hour. This process copies data from
     one SQL Server DB to another.

     Most notably, the following error message we get often (written in
     its entirety):

    --------------------------------------------------------------
     SQLSTATE=08501
     [Microsoft][ODBC SQL Server Driver] Communication Link Failure
    --------------------------------------------------------------


     We cannot seem to isolate the cause of this issue.

     We do know that if we restart our SQL Server Database Server,
     the problem goes away and all is well for about another 2 weeks,
     where we have to reboot the server again. We don't know for
     sure if the cause of all this is a network connectivity issue. We
     believe that it's a database issue.

     Does anyone have any clues as to the possible or most-likely
     cause and solution of this problem?

    Monday, October 10, 2005 10:02 PM

Answers

  • Communication link failure is a mid-stream failure of the connection from client to SQL Server.  This typically points to problems with your networking layer, typically networking hardware.  For example a router in your network could be dropping connections, or a network card that is going bad.

    Unfortunately there is no simple and easy to follow way to diagnose this, it typically requires someone who is expert in network layer to analyze network sniffs between client and server.  Microsoft support folks can help here, they know how to deal with these sorts of issues.

    I'll tell you what I typically do and if you can do this yourself, you are all set.

    To diagnose these problems first thing I do is setup a continuous 24x7 netmon trace on a client machine as well as on the SQL Server.  I've written my own custom batch files that use netcap to automate this, it's fairly simple.  Hard drive space is cheap, burn it up with traces, it is better to have a trace than miss the problem.  You need traces on both sides (SQL side and client side).   If you are not sure which client will fail, you can run traces on all the clients, or pick a few candidates randomly.

    At the same time I setup the client application to log the precise date and time and machine name to the error log for the application.  I need this to correlate the network traces with the time that the error occurred.  If possible I log things on the client side I can use to correlate with a specific connection in the netmon trace, for example the server spid I can get from the netmon trace if the server spid is in the client error log as well it makes it easy to match up.

    Then once I have the traces I look at the netmons and narrow down the connections to the specific one that failed.  Typically you will see a RESET in the connection, but it can get complicated.   What I do is find the connection in both client and server netmon traces and then examine where the RESET came from, did it come from client, or server or both?  If the reset shows up as coming from both client and server, then this means some intermediate hardware reset the connection, you then need to start narrowing down this hardware problem further.  If the reset came from the server, then it could be the SQL Server resetting the connection due to some incoming TDS, examine SQL Server error log for messages around the time of failure, etc...  If the reset came from the client it could be faulty HW on the client (bad client nic) or possibly the user killed the client application for example.

    Monday, October 24, 2005 6:49 AM

All replies

  • Communication link failure is a mid-stream failure of the connection from client to SQL Server.  This typically points to problems with your networking layer, typically networking hardware.  For example a router in your network could be dropping connections, or a network card that is going bad.

    Unfortunately there is no simple and easy to follow way to diagnose this, it typically requires someone who is expert in network layer to analyze network sniffs between client and server.  Microsoft support folks can help here, they know how to deal with these sorts of issues.

    I'll tell you what I typically do and if you can do this yourself, you are all set.

    To diagnose these problems first thing I do is setup a continuous 24x7 netmon trace on a client machine as well as on the SQL Server.  I've written my own custom batch files that use netcap to automate this, it's fairly simple.  Hard drive space is cheap, burn it up with traces, it is better to have a trace than miss the problem.  You need traces on both sides (SQL side and client side).   If you are not sure which client will fail, you can run traces on all the clients, or pick a few candidates randomly.

    At the same time I setup the client application to log the precise date and time and machine name to the error log for the application.  I need this to correlate the network traces with the time that the error occurred.  If possible I log things on the client side I can use to correlate with a specific connection in the netmon trace, for example the server spid I can get from the netmon trace if the server spid is in the client error log as well it makes it easy to match up.

    Then once I have the traces I look at the netmons and narrow down the connections to the specific one that failed.  Typically you will see a RESET in the connection, but it can get complicated.   What I do is find the connection in both client and server netmon traces and then examine where the RESET came from, did it come from client, or server or both?  If the reset shows up as coming from both client and server, then this means some intermediate hardware reset the connection, you then need to start narrowing down this hardware problem further.  If the reset came from the server, then it could be the SQL Server resetting the connection due to some incoming TDS, examine SQL Server error log for messages around the time of failure, etc...  If the reset came from the client it could be faulty HW on the client (bad client nic) or possibly the user killed the client application for example.

    Monday, October 24, 2005 6:49 AM
  • I understand that this topic is old, however, the question is still relevant.

    What if the SQL Server Instance and application are on the same machine?

    In my experience, this error can be caused by SQL Server (I've seen it in 2000, 2005, 2008) being starved of resources, e.g. memory in conjunction with a "large" (using the term loosely) database. 

    Thoughts?

    Thursday, March 4, 2010 6:48 PM
  • what is solution of communication link failure
    Monday, June 7, 2010 11:24 AM
  • For my issue "In my experience, this error can be caused by SQL Server (I've seen it in 2000, 2005, 2008) being starved of resources, e.g. memory in conjunction with a "large" (using the term loosely) database.  " the answer is the following:

    1. Use the latest driver, e.g. sqlclni, when you make your connection or in the connection string. Example:
      provider=sqlncli;data source=(local);database=YourDatabaseName;trusted_connection=yes;mars connection=true
    2. In your connection string use "mars connection", which helps with concurrent connections.
    3. Increase physical memory in proportion to the size of the database.  You will know to bump it up when queries take progressively longer and when optimization of queries do not help any longer
    4. Move DB to a more powerful physical machine or increase cores for virtual machine
    5. If application and DB are on the same machine, move DB to a dedicated machine on the same physical network, and preferribly the same switch.

     

    Monday, June 7, 2010 12:08 PM