locked
[DBNETLIB][ConnectionWrite (send()).] General network error. Occurs when SQL times out on new Windows 2003 sp2 server RRS feed

  • Question

  • We just replaced an older Windows 2000 sp4 server with a new Window 2003 Standard Edition sp2 server to run our batch processing.  We noticed that when the SQL command times out we now get the following error in the ADO command object error collection.

     

    [DBNETLIB][ConnectionWrite (send()).] General network error.  Check you network documentation.

    Native Error: 11

    SQL State: HY018

     

    msado15.dll version 2.82.3959 is on this server.

     

    On other servers running Windows 2003 Enterprise Edition sp1 the error on the ADO command object is normal with.

     

    Timeout expired

    Native Error: 0

    SQL State: HYT00

     

    msado15.dll version 2.82.1830 is on this server.

     

    The SQL server they are talking to is on the other side of a firewall.  It is SQL 2005 sp2 running on Windows 2000 sp4.

     

    I have also tested this on a Windows 2003 sp2 server that doesn't have to cross a firewall and get the correct Timeout error.  It also has the same version of msado15.dll as problem server.

     

    The application is a collection of VB6 Components that are running in COM+ applications.  I have isolated the test application to our one SQL interface component and have build a test vbs that can reproduce this on demand.  The VB6 Components were not modified in any way for this change and have been working fine for many years prior to this.

     

    All other aspects of our batch processing works as expected.  Transactions are working fine.  As long as the SQL doesn't timeout everything is normal.  DTCPing was used to ensure both servers have correct ports open to allow DTC traffic.  Checked the SynAttackProtect setting on SQL and this is not an issue since other clients have no problems.

     

    One point I would like to add is that this new server was initially setup with a temporary server name and then was changed to use the name of the original server on move day.  Don't know if this has any impact or not.

     

    Thursday, July 12, 2007 6:54 PM

Answers

  •  Matt,

     

    Fixed my problem this morning.  Seems as all the new NIC cards coming out of HP have TOE built in.  (Ours is HP NC373i)  Install Windows sp2 and it is now enabled in Windows.   This is why I was not seeing any RPC traffic when I was sniffing on the box itself.  That traffic was being offloaded.

     

    This morning I added a registry parameter DisableTaskOffload with a value of 1 into the tcp/ip parameters.  Disabled the TOE on the NIC card and rebooted.  Now when I run my script I get the correct Timeout message from ADO.  Wireshark can now see all the RPC traffic going into and out of the machine.

     

    We've been having some strange PDF file corruptions as well lately.  Time will tell if this solves that problem as well.

     

    I have also made other changes that I should add.  Not sure if they impact this or not.  When I get another server I will test these individually.

     

    In the same tcp/ip parameter registry area.

     

    EnableRSS = 0

    EnableTCPA = 0

    EnableTCPChimney = 0

    Wednesday, July 18, 2007 12:49 PM

All replies

  • From a high level this sounds like an issue with the firewall, since the same client bits (and same server bits) work if they don't cross the firewall.

     

    When the client times out, it sends a small packet called the attention packet to the server.  When the server recv's this packet it sends back a response called the attention ack.  It could be possible that the firewall is blocking these small packets for some unknown reason, but I have never seen this before.  I would contact your firewall vendor to see if they can look at this at the network level.

     

     

    Thursday, July 12, 2007 7:41 PM
  • I'm in the process of getting our network engineer on board to trace the traffic between these 2 servers.  The only thing is that I have another server (Windows 2003 sp1) that must go through the same firewall and it works just fine.
    Thursday, July 12, 2007 8:56 PM
  • One thing I recommend you do is run a trace on both the client and server machines simultaneously when you run the test.

    This will tell you if some hardware in the middle is causing the problem.

    Thursday, July 12, 2007 9:02 PM
  • I am experiencing a very similiar/urgent issue as well.  We have Biztalk 2002 installed on a windows 2000 server. The biztalk 02 databases reside on a windows 2003/sql 2005 server.  We recently applied the new service pack (SP2) to the windows 2003 server and now intermittently revceive the "general network error" as well as many "unspecified" errors from the biztalk COM+ objects, these generally happen when the BT server is trying to retrieve coinfiguration from it's config database under load.

     

    The only change that has been made has been the service pack.  Are there any other COM+, Networking or the like modifications made by the sp2 patch that could cause such issues.

     

    I'm at a loss but this sounds similiar.

    Monday, July 16, 2007 6:45 PM
  • This sounds more like the SynAttackProtect issue ->

     

    See our blog for the gory details:

     

    http://blogs.msdn.com/sql_protocols/archive/2006/04/12/a-special-gne-general-network-error-messages-when-running-sql-server-after-installing-service-pack-1-for-windows-server-2003-and-tcp-registry-key-synattackprotect.aspx

     

    To be honest I would take the "disable SynAttackProtect" solution since in most cases your SQL boxes are inside corpnet you are very safe from this attack.

     

     

     

     

    Monday, July 16, 2007 9:08 PM
  •  

    Our SQL server is running windows 2000 sp4 and we have several servers running our code that don't have this issue.  Just in case I applied the SynAttackProtect setting and rebooted our SQL server with no change.

     

    Today networking changed the firewall settings to allow all traffice to flow between the 2 servers.  Same issue continues.

     

    I have 2 trace files from the SQL server view of the conversation.  I have to wait for the other side to be done.  Maybe these can shed some light on the Issue.  Is there a way for me to get them posted for review?

     

     

    Monday, July 16, 2007 9:23 PM
  • Actually I was replying to mitre's post recommending SynAttackProtect.  For your issue garrytman this would not apply, sorry for the confusion.

     

    What sort of trace did you capture?  Netmon?

    Monday, July 16, 2007 9:34 PM
  • Hi Matt,

     

    Thanks for your insight.  We are monitoring for this as well.  One thing that is unusual though is the server already had windows 2003 sp1 installed on it, we only started seeing the issue once sp2 was added.  The symptoms are similiar as to what is described on the blog.

     

    Unfortunately, I cannot add retry logic to the components since they are BT 02, blackboxed components being used by the bt server runtime to call our implmented AIC components

    Monday, July 16, 2007 9:35 PM
  •  They are not Netmon captures.  The network folks told of some shareware product called WireShark that I could use to look at them.  But analyzing the traces is beyond my current skillset.  I downloaded the shareware and I can look at the trace files.  They have the extension .enc if that helps.

     

    The only thing that stands out to me is one packet that is showing up in Red.  It is from the client to the server a TCP packet [RST, ACK] that happens about 10 seconds after the call starts to the StoredProc.  Which is the timeout setting in the my test script.  Why the analyzer software marks this packet as red I don't know yet.

    Tuesday, July 17, 2007 1:05 PM
  •  I put WireShark on 4 different machines and have come up with some interesting findings.

     

    On the problem machine I am not capturing any TDS traffic to the SQL server only TCP on all the other clients there is a bunch of TDS traffic between the servers.  I've checked my capture filters several times and don't think I am only filtering on TCP packets only.  I will continue to look.

     

    On good clients I see the Cancel Packet with the Response Packet from SQL server containing 'The statement has been terminated.'

     

    Good clients include:

    • Windows XP sp2 (No firewall
    • Windows 2003 sp2 (No firewall)
    • Windows 2003 sp1 (thru same firewall)

    Bad client is Windows 2003 sp2 thru firewall.

     

    At least I have something concrete to work with now.  I'll keep digging.

    Tuesday, July 17, 2007 2:19 PM
  • Yes, during a normal timeout what you should see is the following:

     

    1. Client sends request to server (1 or more tcp packets).

    2. A gap in the trace activity while the server processes the request up to the command timeout value.

    3. When the client detects that command timeout expires, the client will send a small (8 byte) packet called the attention packet to the server.

    4. You should then see an attention response packet from the server very shortly thereafter.

     

    Note during the above exchange there is no need for client and server to tear down the connection, so if you see RST or FIN ACK on the connection something is going wrong.

     

    If you see RST from client to server, most likely this is due to firewall resetting the connection.  This is why you need to run tracing on the client as well.  What you will see if the firewall resets is a RST from client to server in the server trace and a RST from server to client in the client trace (indicating something in the middle RST the connection).

     

    If you see RST from client on both client and server side traces, then you know the RST came from client.  RST can come from client in some cases, for example if client application terminates abruptly without closing the connection. 

     

     

     

     

     

     

    Tuesday, July 17, 2007 5:38 PM
  •  Here is were I'm at.  I've upgraded NIC card driver.  Noticed IWAM and IUSR were incorrect (because of server temporary name during initial install)  I uninstalled IIS and reinstalled so IWAM and IUSR id's were now correct.

     

    Still no relief.  Problem seems to be in RPC area as I see no RPC traffic between the two servers.  All the sniffer picks up is TCP traffic.

    Tuesday, July 17, 2007 7:21 PM
  • So you see a RST coming from client machine in trace on client machine?

    If this client code is running inside IIS, where is it running?  isapi filter? ASP.NET application?  ASP application? CGI application? PHP application?

     

    Tuesday, July 17, 2007 7:46 PM
  • We ran network sniffers as well and found nothing wrong.  In our SQL profiler, on the biztalk object calls that fail, we see as the last line:

     

    if @@TRANCOUNT  > 0  COMMIT TRANS

     

    Then we get an "unspecified" error back from the bt object.  They are closed objects so I have no idea how they make calls to the db, whther they dispose properly etc etc...This error does not happen consistently.  Again nothing in our code has changed, just the application of sp2 on the windows 2003 server which already had sp1.  I did notice in the release notes that this sp2 patch "fixed" problems with MSDTC, DATA Access and COM+.  My fear is one of these fixes caused a subtle change resulting in issues with our BT server environment and how it accesses the database on the win 2003 server. 

     

    I'm quickly running out of ideas to test.

     

    Tuesday, July 17, 2007 10:11 PM
  • I'll check with my friends in BizTalk if they've heard of this one and let you know what I find.
    Tuesday, July 17, 2007 10:23 PM
  • Hi Matt,

     

     

        I am the Network Administrator working with Mitre on this issue. You seem to understand SQL communcation. I was wondering if you could point me to some info on this. Lower level the better.
    Tuesday, July 17, 2007 10:38 PM
  • Talked to BizTalk guru.

     

    He said:

     

    #1. BizTalk 2002 is not supported with SQL 2005 (fyi).

    #2. Only other thing he can think of is changes to DTC security with Windows 2003 SP2.  You could try comparing security settings on MSDTC service from good/bad machine.

     

    Wish I had more.

    Tuesday, July 17, 2007 10:40 PM
  • Thanks again Matt, Sorry I misinformed, the sql box is sql 2000.

     

    If the DTC security settings were wrong, I'd think this error would be consistent and repeatable but unfortunately this is not the case. 

    Tuesday, July 17, 2007 10:57 PM
  • I talked again to the BizTalk folks.

     

    They are telling me BTS2002 is not supported with Windows 2003 SP2.

    They recommend going back to SP1.

     

    Also they asked if you were running BTS2002 SP1, you should have SP1 for BTS installed.

     

    But in general I would say safest bet is to roll back Windows 2003 service pack since you know it worked before.

     

    Or you could upgrade to newer BizTalk but this might be more difficult.

     

    Matt

    Wednesday, July 18, 2007 1:55 AM
  •  Matt,

     

    Fixed my problem this morning.  Seems as all the new NIC cards coming out of HP have TOE built in.  (Ours is HP NC373i)  Install Windows sp2 and it is now enabled in Windows.   This is why I was not seeing any RPC traffic when I was sniffing on the box itself.  That traffic was being offloaded.

     

    This morning I added a registry parameter DisableTaskOffload with a value of 1 into the tcp/ip parameters.  Disabled the TOE on the NIC card and rebooted.  Now when I run my script I get the correct Timeout message from ADO.  Wireshark can now see all the RPC traffic going into and out of the machine.

     

    We've been having some strange PDF file corruptions as well lately.  Time will tell if this solves that problem as well.

     

    I have also made other changes that I should add.  Not sure if they impact this or not.  When I get another server I will test these individually.

     

    In the same tcp/ip parameter registry area.

     

    EnableRSS = 0

    EnableTCPA = 0

    EnableTCPChimney = 0

    Wednesday, July 18, 2007 12:49 PM
  • Matt,

     

         Thanks for all your help. Just to clarify, your Biztalk folks are saying that the BT databases cannot live on a server running Windows 2003 SP2? Our Biztalk servers are running Windows 2000 SP4 but their databases live on Windows 2003 SP2/SQL 2000 SP3.

    Wednesday, July 18, 2007 2:45 PM
  • This says Biztalk 2002 Partner Edition has been tested on Windows 2003 SP2. http://support.microsoft.com/kb/926031

     

    Our environment:

     

    Biztalk Server
       - Windows 2000 SP4

     - Biztalk 2002 Enterprise SP1

     

    DB Server

      - Windows 2003 SP2

      - SQL Server 2000 SP3

    Wednesday, July 18, 2007 3:17 PM
  • Thanks again for all your help matt, you've been extremely helpful to us in tracking down these issues.
    Wednesday, July 18, 2007 4:46 PM
  • Yes, so it should work.  I would contact BizTalk support for more advanced debugging help, not sure what else I can do for you.
    Wednesday, July 18, 2007 6:15 PM
  • Garry,

         I am glad you were able to fix your issue. We are experiencing the same sort of issue. We even use the same NIC (NC373i) on our DL380 G5. We are currently formulating a plan on what we are going to do about it. We are considering rolling off SP2 but we may just disable TCP offload and RSS. I'll post when we have more info.

     

    Wednesday, July 18, 2007 9:56 PM
  •  Unfortunately this is a Production box that I needed to get stable.  So turning off TOE was my only option right now.

     

     I'm going to get another machine configured just like it to test with.  I noticed that HP has a newer driver 3.0.7 for the card and will try that out.  I'll post my results when I find out more.

    Thursday, July 19, 2007 12:17 PM
  • I am getting the very same problem in a VB6 COM component. Every few days the error occurs on one users site on a particular line opening an ado recordset. The database is SQL 2005, the provider is MSDATASHAPE over SQLOLEDB and the statement uses the SHAPE syntax.The program is large and is in constant use but the error is infrequent and only occurs on this one Select. I have no reports of any problems with SQL Server 2000 even on 2003 sp2.

    Saturday, August 11, 2007 9:41 PM
  • John,

     

    We solved our issue by disabling the TCP offloading features that were enabled in Windows 2003 SP2 although the problem was really with the NIC driver. If you are getting an error similar to the ones in this thread I would suggest doing the same to see if your problem goes away. Doing so should not affect your server adversely. If in fact this does solve teh problem you should should consider upgrading your NIC driver. I would also consider leaving these TCP offloading features turned off unless you know that you need them.

     

    Monday, August 13, 2007 2:49 PM
  • I will try this, though sometimes a week or more goes by without the problem, so it will be some while before I can be sure of a fix.

     

    The thing that worries me this; the program contains dozens of SELECT statements on various databases and is running all the time. The problem only seems to occur on a single SELECT in a particular program line - you would think a network problem would manifest itself randomly.

    Wednesday, August 15, 2007 6:28 PM
  • Hi John,

     

    I've run into this in the distant past with one customer.  What was happening is they had a corrupted TEXT column value.  So every time they tried to select the value from a specific row and column, it would cause the server to fail to render the response and this triggers the connection drop.

     

    So what you want to do is run dbcc checktable on the table to look for corruption, this should resolve the issue.

     

    I have also seen this one time with a corrupt index as well. So if the checktable does not solve the problem look at rebuilding the indexes for the table.  Make sure you read the help topic on dbcc checktable before you run it, it can be very expensive operation depending upon the size of the table.

     

     

    Wednesday, August 15, 2007 6:44 PM